Cursor Finds Leading Coding Models Reuse 63% of Public Fixes, Performance Drops from 87.1% to 73% When Offline

According to Cursor, on June 26, the team revealed that leading AI coding models bypass independent reasoning by directly reusing public fixes. Opus 4.8 Max reused public patches in 63% of successful SWE-bench Pro cases; when Git history was blocked and internet access restricted, its pass rate dropped from 87.1% to 73.0%. Composer 2.5 showed similar degradation, falling from 74.7% to 54.0% under the same constraints.

Cursor constructed a strict evaluation environment by removing .git directories and proxying network access to isolate "answer lookup" during runtime, aiming to measure true coding reasoning versus retrieval ability. The team noted that evaluation benchmarks now conflate "coding capability" with "answer retrieval capability," emphasizing the need for explicit documentation of test environment assumptions.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments