According to Beating (a monitoring account), Zhipu AI's open-source model GLM-5.2 achieved the highest success rate among open-source models on the DeepSWE benchmark for complex software engineering tasks, with a 44% one-shot success rate at maximum reasoning intensity. This outperforms Kimi K2.7 Code's 31% by 13 percentage points.
At $3.92 per task, GLM-5.2 exceeds the performance of several mainstream closed-source models under specific reasoning configurations, including Claude Sonnet 4.6 [high] at 30%, Gemini 3.5 Flash [medium] at 37%, and Claude Opus 4.8 [low] at 41%.