MiniMax Intelligence CEO Li Dahai stated at the 2026 Beijing Zhiyuan Conference that agent technology requires a measured approach despite rapid advancement. Speaking to Pengpai News and other media, Li explained that public expectations for zero-error agents exceed what the current technical development curve can deliver, as the technology still needs time to mature. He identified 2025 as the first year of agents, anticipating explosive growth that will profoundly impact human society, though he emphasized the need for calm assessment of current technical capabilities in the AI agent space.

Li Dahai Describes Agent Technology Limitations and Rapid Evolution

Li Dahai acknowledged that the integration of large models and agent technology is evolving rapidly, with some scenarios already landing in practical applications. When discussing agent limitations, Li stated bluntly: "Problems everywhere." He elaborated that "the evolution of model and Agent technology is very fast," explaining that "perhaps today some work has a 10% error rate, and next month the error rate drops to 1% — rapid evolution has become a core trend."

MiniMax CEO Refutes Small Model Distillation Misconception

Li Dahai directly challenged the industry's widespread belief that "making good small models must come from distillation of ultra-large-scale foundation models," calling this a "cognitive misconception." He explained: "Behind distillation there is a very specific premise: the object of distillation itself must be a good model. Distillation is essentially: for companies that lack the capability to develop foundation models themselves but want to do application landing, they adopt existing small-sized foundation models and obtain specific scenario capabilities through fine-tuning. In this process, they may indeed use other large models to synthesize data to let small models acquire corresponding capabilities." Li stated that this is the paradigm for all large model training, not limited to small models alone.

MiniMax Transitions Training Workloads to Domestic Chips

Li Dahai disclosed: "Since this year, as the industry as a whole has shifted inference to domestic chips, we are also gradually transferring training work to domestic chips and domestic clusters." He identified two parallel paths for improving the domestic computing power ecosystem: the first is bottom-up refinement, where large model companies gradually improve the ecosystem through their own training practices, "like wetting a stone slab bit by bit, which takes time." The second path is top-down planning, exemplified by MiniMax's deep cooperation with Zhiyuan Research Institute on the FlagOS software ecosystem, where large model companies and chip companies establish deep cooperation and advance under planning. Li Yuxuan, head of MiniMax Intelligence AIInfra, noted that inference actually requires higher precision than training, and MiniMax's proposed model scaling technology became a key breakthrough: achieving the effect of predicting large models with very small models, providing in-depth evaluation on domestic chips, aligning experimental details with overseas manufacturers, and confirming that training precision is usable. MiniMax disclosed it has achieved extremely low bit-width quantization-aware training on Huawei's platform, reaching 95% of the efficiency of ordinary training. Li Dahai explained that the 5% loss comes from the overhead of the quantizer itself, and through deep cooperation with Huawei, this overhead has been optimized to the minimum.

MiniCPM-5 1B Achieves Near-GPT-4o Performance on ArtificialAnalysis Benchmark

MiniMax Intelligence announced that the MiniCPM Small Cannon fifth-generation 1B version achieved a score of 17.9 on the authoritative ArtificialAnalysis (AA) evaluation. Open-source community researchers compared and found that GPT-4o (200B parameters), released in May 2024, scored 18.3-18.6 on the same type of evaluation, with a difference of only 0.4-0.7 points between the two. Li Dahai stated: "In 2024 we predicted that by the end of 2026, the intelligence level of edge models could reach GPT-4 level. From current data, this goal has been achieved ahead of schedule."

During the previous "MiniMax Open Source Week," MiniMax Intelligence released two edge large models: MiniCPM5-1B and BitCPM-CANN. MiniCPM5-1B refreshed the upper limit of model intelligence density again: with only 1B parameter scale, it surpassed all models below 2B parameters on the internationally renowned AA-Index leaderboard; compared to Qwen3.5-2B released 3 months earlier, MiniCPM5-1B not only has better performance but also reduced parameters by half.

ForgeTrain AI-Written Framework Trains 10% Faster Than NVIDIA Megatron

The MiniCPM5-1B model was pre-trained by MiniMax Intelligence's independently developed AI training framework ForgeTrain, which is the world's first production-grade large model pre-training framework completely written by AI, with no human programmer participation. The training speed is 10% faster than NVIDIA Megatron.

FAQ

What did Li Dahai say about agent technology limitations at the 2026 Beijing Zhiyuan Conference?

Li Dahai stated that public expectations for zero-error agents exceed what the current technical development curve can deliver, and the technology still needs time to mature. He described current agent limitations as "problems everywhere," but emphasized that error rates are dropping rapidly — from 10% to 1% within a month in some cases.

How does MiniCPM-5 1B performance compare to GPT-4o on the ArtificialAnalysis benchmark?

MiniCPM-5 1B (with 1B parameters) scored 17.9 on the ArtificialAnalysis evaluation, while GPT-4o (with 200B parameters, released in May 2024) scored 18.3-18.6 on the same evaluation, resulting in a difference of only 0.4-0.7 points between the two models.

What is ForgeTrain and how does it compare to NVIDIA Megatron?

ForgeTrain is MiniMax Intelligence's independently developed AI training framework that is the world's first production-grade large model pre-training framework completely written by AI with no human programmer participation. It trains 10% faster than NVIDIA Megatron.

View Source

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.