Anthropic Apologizes for Claude Fable 5 Hidden Safeguards, Replaces with Visible Fallbacks to Opus 4.8 This Week

According to Anthropic's official X account on June 11, the company apologized for secretly degrading Claude Fable 5 responses for users suspected of building competing AI models, admitting the invisible safeguards were "the wrong tradeoff." Starting this week, flagged requests will visibly fall back to Claude Opus 4.8 instead of silently delivering degraded output. On the API, users will now receive a stated reason when a request is refused, with server-side fallback notifications rolling out in the next few days. Anthropic acknowledged the tradeoff: making safeguards visible makes them easier to bypass, meaning more false positives for legitimate machine-learning work may occur as the company tunes its systems.
Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments