RockAI CEO Sees Simultaneous Training and Deployment as the Best Way for Autonomous Learning |-钛媒体官方网站

RockAI CEO Liu Fanping said: "In the future, only two to three companies based on the Transformer architecture's general large models are likely to succeed."

Intelligent robot based on the self-developed Yan series model

TMTPOST -- RockAI has recently launched the newly upgraded Yan 1.3 large model in Shanghai, which adopts a non-Transformer architecture. This model can efficiently process multimodal information such as images, text, and voice, and is applicable to various terminal devices such as drones, robots, PCs, and mobile phones.

RockAI CEO Liu Fanping stated that this is the world's first end-to-end multimodal large model in the field of human-computer interaction, as well as the world's first multimodal large model spanning a wide range of devices. It is also a self-developed group intelligence unit large model launched domestically. Currently, this model has quickly achieved adaptation on diverse hardware devices such as NVIDIA, Qualcomm, MediaTek, Intel, and Rockchip, further accelerating commercial implementation and enabling every device in the world to have its own intelligence.

After the event, Liu noted that the Yan 1.3 large model will be deployed to cover a wider range of groups from low-end to high-end. Speaking about the future development of the industry, Liu Fanping emphasized, "In the future, there will only be two to three successful general large models based on the Transformer architecture. The problem-solving scenarios for Transformer-based 'god-like' large models will still be limited. In comparison, group intelligence unit large models aimed at group intelligence are more meaningful."

“Under the ultimate proposition of AGI (Artificial General Intelligence), whether it is cloud-based or edge-based, they are merely carriers to achieve the universalization and leap of intelligence. We always believe that only by truly enhancing the self-learning ability of each device can we stimulate the emergence of higher-level intelligent forms, namely collective intelligence,” said Liu.
RockAI CEO Liu Fanping

RockAI CEO Liu Fanping

RockAI, founded in June 2023, is a AIGC subsidiary of the A-share listed company Yanshan Technology (002195.SZ), focusing on creating a one-stop AIGC digital intelligence service platform. The RockAI team was formed as early as 2022.

Compared to other large model enterprises, RockAI focuses on the development of AI large models based on a self-developed non-Transformer architecture. In January this year, RockAI released the Yan1.0 model for the first time and announced industry and scenario solutions for the To B vertical field based on AI large models, releasing products including the RockAI model brain, knowledge base Q&A, business assistant, and intelligent customer service.

Previously, Liu told TMTPost that innovative AI algorithms are urgently needed, as many existing algorithms cannot meet the actual application needs of customers and also impose certain pressures on the cost of large models. He stated that currently, computing power accounts for nearly half of the customer delivery costs, while RockAI uses a self-developed non-Transformer memory logic model architecture to reduce delivery costs to about 30%-40%.

The so-called non-Transformer architecture actually replaces the Transformer’s Attention mechanism with the underlying neural network architecture MCSD. Since Attention is the core of the Transformer architecture, its weighting algorithm causes the computational load to increase exponentially with the length of the sequence, thus demanding more computational power. RockAI replaces Attention with the MCSD mechanism, transmitting only the most effective information and features, thereby reducing computational complexity in a linear manner and enhancing training and inference efficiency.

Today, RockAI has updated the Yan1.3 model once again.

Compared to the Yan1.0 released in January, Yan1.3 possesses powerful multi-modal capabilities, efficiently handling multi-modal information such as images, text, and voice. It also achieves lossless offline deployment across a wider range of devices, running smoothly even on ordinary computer CPUs.

"On the journey of technology, RockAI is writing a new chapter as an innovator and trailblazer. We are proud to announce that in terms of performance, our self-developed Yan 1.3 architecture has surpassed Llama3, marking a milestone for large models with independent architectures in China. Today, the uncompressed and untrimmed Yan1.3 large model demonstrates seamless adaptation and extraordinary capabilities on an increasing number of devices, proving the rationality of RockAI's technical route and its potential to fully unleash collective intelligence," said Liu Fanping in his speech.

On the commercial front, the "Panghu" intelligent robot based on the Yan1.3 model can recognize complex environments and accurately understand user intentions in an offline state, thereby controlling its mechanical body to efficiently complete various complex tasks.

At the same time, the Yan series models have also been deployed in scenarios such as drones, mobile phones, and PCs. For instance, the Feilong drone based on Yan1.3 can fully perceive complex environments and process them in real-time on the device. It supports intelligent inspections in various environments, adapting not only to urban governance and industrial scenarios like power inspections, security monitoring, and environmental monitoring but also being widely used in daily life scenarios for individual users.

Liu stated that RockAI currently empowers various devices with the "Yan inside" model and is willing to empower companies with large terminal sales. However, it will not ignore terminal manufacturers with smaller volumes, such as those deployed in niche but potentially promising embodied intelligence, as well as those focusing on toys or companion robots, thereby accelerating the widespread application of large models.

"Yan1.3 can already be deployed and applied on a wide range of devices, opening up a diversified hardware ecosystem. Therefore, we believe that in the second and third stages, it will achieve autonomous learning." Liu believes that the realization of RockAI's collective intelligence includes four stages: innovative infrastructure, diversified hardware ecosystem, adaptive intelligent evolution, and collaborative collective intelligence. The company is currently still in the second stage of implementation.

In addition, RockAI also released the "training-push synchronization" technology at the conference. Liu Fanping said this is the best way to achieve autonomous learning of collective intelligence unit large models. After the conference, he further explained that "training-push synchronization" is more inclined to the algorithm level, aiming to not only run on the terminal side but also perform training, which will be a significant advancement.

Talking about future development, Liu emphasized that Transformer is a possible path to AGI (Artificial General Intelligence), but the practical process cannot prove it is the optimal path. We should let large models work by simulating the way the human brain works, rather than outputting without time differences as in the Transformer architecture large models. "As one of the very early researchers of Transformer, our process from understanding Transformer, delving into Transformer, to abandoning Transformer was a helpless but forced innovation process. This process was very long."

"In the current competitive era, it seems that domestic large model development has entered a 'patterned' dilemma. Innovative ideas are becoming scarce, and everyone is following the footsteps of foreign countries, as if it were an endless technical relay race. During the Spring Festival, OpenAI released the video model Sora, and domestically everyone followed it. Therefore, the Yan architecture is like a breath of fresh air, proving the infinite possibilities of Chinese wisdom with a non-Transformer architecture. Imagine, how many of China's large models are truly self-developed? We can't 'create shells' because our underlying architecture is entirely our own; we can't use others' things, so we can only start from scratch to surpass Llama 3. Many companies using the Transformer architecture haven't even surpassed Llama 3 yet. This approach determines our path of innovation, continuously proving that it is an increasingly correct thing. Unlike other manufacturers, we choose a more difficult path, facing challenges head-on because we know that RockAI, as a true innovator, cannot rely on 'shelling' to survive. We will definitely be better in the future than we are now." Liu said in his speech.

The following is a Q&A between RockAI CEO Liu Fangping and TMTPost:

Q: What is the difference between the training and inference synchronization of the Yan large model and the reinforcement learning introduced by OpenAI O1?

Liu: Strictly speaking, OpenAI O1 is a reinforcement learning process, but the training and inference synchronization of Yan1.3 is not on this route, or rather, reinforcement learning is not the main focus. Reinforcement learning requires a large amount of data to adjust, which means its application cost is very high. This is also the reason why O1 has limitations in practical use.

The goal of training and inference synchronization is to be able to run and train on the terminal side, which will be a significant advancement. We hope for cheaper training, not more expensive. Training and inference synchronization can be understood as the model's training and inference happening simultaneously. Just like a person, when chatting, they can infer through voice output, but at the same time, they can also use auditory and visual inputs for real-time learning.

Q: How is training conducted on local devices with training and inference synchronization?

Liu: We just mentioned the brain-like activation mechanism. Actually, when humans process events, not all neurons in the brain participate. On the terminal side, we also select only a portion of the neurons to participate. For example, with 10 billion parameters, not all parameters will be involved in the process, but only a very small portion. This way, the computational power requirement will be very low, and training and inference can be carried out completely synchronously, and the parameters can also be adjusted during output.

Question: In which field do you think large models might see a major breakthrough in the future?

Liu: I think there might be some breakthroughs in consumer terminal devices. Currently, there is already a significant stock of mobile phones and PCs, but I believe more consumer terminals should soon open up this situation. Many current terminal devices still operate on fixed programs, but soon they should be able to change through AI, including changes in interactive experiences. For example, a drone equipped with Yan1.3 can engage in human-machine dialogue. We can instruct the drone to take photos from different angles, such as capturing the scenery behind me and placing me in the center of the frame.

Question: People generally think that large models are a form of information compression and usually generate content at this level, but we emphasize learning. What is the reason for this?

Liu: Learning is no longer about predicting the next token. For example, previously, manufacturers were working on text-to-video generation, which had a significant pain point: it was difficult to edit the generated content, requiring a complete re-edit of the prompt. We want to teach the large model the skill of making videos, so it can produce videos after learning the skill. Therefore, we prefer models to learn processes rather than outcomes. Process-oriented learning will also make the model more interpretable.

Question: There is a viewpoint that there are too many general-purpose large models now, and in the future, there might only be two or three left. What is your opinion on this?

Liu: In the future, there may only be two to three successful general large models based on the Transformer architecture. However, the scenarios where such "god-making" large models can solve problems are still limited. In comparison, large models aimed at collective intelligence units are more meaningful.