China's Large AI Models Likely to Lead the World, but Most 10,000-GPU Clusters Are Inefficient, Says Academician-钛媒体官方网站

Zheng cited GPT-4 as an example, explaining that the model required 10,000 GPUs over the course of 11 months, with nearly half of that time spent on data preprocessing. This phase remains highly inefficient by current standards.

TMTPOST -- China has significant advantages in developing large model solutions tailored to different industries, and could potentially lead the world, said Zheng Weimin, a member of the Chinese Academy of Engineering and a professor at Tsinghua University's Department of Computer Science and Technology.

Zheng made the remarks on Wednesday at a conference co-organized by Global Times, the Center for New Technology Development of the China Association for Science and Technology (CAST), and the Technology Innovation Research Center of Tsinghua University.

In 2024, China's large AI model industry was characterized by two main trends: the transition from foundational large models to multimodal models, and the integration of large models with industry applications, he noted.

Zheng explained the five key stages in the lifecycle of large models and identified the challenges at each step. The first stage is data acquisition. Large model training requires massive amounts of data, often in the billions of files. The difficulty lies in the frequent reading and processing of these files, which can be time-consuming and resource-intensive.

The second stage is data preprocessing. Data often requires cleaning and transformation before it can be used for training. Zheng cited GPT-4 as an example, explaining that the model required 10,000 GPUs over the course of 11 months, with nearly half of that time spent on data preprocessing. This phase remains highly inefficient by current standards.

The most widely used software in the industry for this process is the open-source Spark platform. While Spark boasts an excellent ecosystem and strong scalability, its drawbacks include slower processing speeds and high memory demands. For instance, processing one terabyte of data could require as much as 20 terabytes of memory. Tsinghua University researchers are working on improvements by writing modules in C++ and employing various methods to reduce memory usage, potentially cutting preprocessing time by half.

The third stage in the lifecycle is model training. This step demands substantial computational power and storage. Zheng emphasized the importance of system reliability during training. For example, in a system with 100,000 GPUs, if errors occur every hour, it can drastically reduce training efficiency. Although the industry has adopted a "pause and resume" method, where the system is paused every 40 minutes to record its state before continuing, this approach is still limited in its effectiveness.

The fourth stage is model fine-tuning, where a base large model is trained further for specific industries or applications. For example, a healthcare large model may be trained on hospital data to produce a specialized version for the medical field. Further fine-tuning can create models for more specific tasks, such as ultrasound analysis.

AI chips play a critical role in the large model industry, and Zheng highlighted the need for greater domestic chip development. While China has made substantial progress in AI chips over the past years, there are still challenges in terms of ecosystem compatibility. For example, it may take years to transfer software designed for Nvidia to systems developed by Chinese companies. The industry’s current strategy is to focus on improving software ecosystems to enable better linear scaling and support for multi-chip training.

Zheng further pointed out that building a domestic "10,000 GPU" system, although challenging, is essential. Such a system would need to be both functionally viable and supported by a strong software ecosystem. Additionally, heterogeneous chip-based training systems should be prioritized for their potential to accelerate AI development.

China's computing power has entered a new phase of rapid growth, largely driven by projects such as the initiative to build national computing network to synergize China’s East and West, and large model training. High-end AI chips are in heavy demand for large model training, while mid- to low-end chips remain underutilized, with current utilization rates hovering around 30%. With proper development of China’s software ecosystem, this rate could potentially increase to 60%.

At the event, Jiang Tao, the co-under and senior vice president of iFLYTEK, introduced "Feixing-1", China's first large-scale AI model computing platform. iFLYTEK’s large models have already reached performance levels comparable to GPT-4 Turbo, surpassing GPT-4 in areas like mathematical reasoning and code generation, according to Jiang.

You Peng, the president of Huawei Cloud AI and Big Data, shared his views on the future of the AI industry. He predicted that the number of foundational models would likely be concentrated in the hands of three or five key players. However, the need for industry-specific models would continue to grow, creating opportunities for other companies to build specialized applications based on these foundational models.

You summarized three key points from Huawei’s AI-to-Business (AI To B) practices. First, not all companies need to build massive AI computing infrastructures, especially since many can leverage cloud-based solutions for efficient training, reinforcement learning and reasoning.

Second, companies may find it more cost-effective to apply mainstream foundational models to their specific use cases rather than training their own models.

Lastly, not all applications companies need to pursue large models, as smaller, specialized models can continue to be valuable tools in specific domains, with large models serving as coordination systems.

China's Large AI Models Likely to Lead the World, but Most 10,000-GPU Clusters Are Inefficient, Says Academician

最近资讯

敬原创，有钛度，得赞赏

关注喜欢的作者

参与互动讨论

作品投稿