Zhipu Releases GLM-5.1 High-Speed API Hitting 400 Tokens/s Global Record

According to Beating Monitoring, Zhipu has launched GLM-5.1 High-Speed API for select enterprise customers, with model output speed reaching 400 tokens/s, setting a new global record for large language model official interface throughput. The high-speed version is powered by a high-performance inference engine co-developed by Zhipu and the TileRT team, maintaining full capability of the flagship model while significantly reducing latency through GPU kernel optimization and tile-level task scheduling.
Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments