Claude Fable 5 Debugging Score Drops 86.2 to 25.9 on July 1, But Arena.AI Shows Performance Flat

According to BridgeBench, Claude Fable 5's debugging score collapsed from 86.2 to 25.9 after its July 1 reinstatement, with refactoring falling from 73.6 to 38.4. However, the decline reflects Anthropic's new safety classifier routing most coding tasks to Claude Opus 4.8, not model degradation. Of 12 debugging tasks, only three reached Fable 5; the classifier intercepted nine by design to prevent jailbreak exploits.

Arena.AI's simultaneous human-preference testing across thousands of blind votes found Fable 5 performance mostly unchanged post-reinstatement, with document scores up 34 points and expert text up 25. General users handling creative writing, research, and analysis will likely notice minimal impact, while developers working on security-adjacent code face frequent fallback routing. Anthropic acknowledged the classifiers currently cast too wide a net but provided no timeline for refinement.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments