Claude Fable 5 Breached Within 48 Hours of Release; System Prompt Leaked on GitHub

According to researcher Pliny the Liberator, Claude Fable 5—released by Anthropic on June 9—was successfully broken within 48 hours of launch. The researcher bypassed the model's safety classifier using multi-agent coordination tactics, collectively termed "pack hunt," which combined character-level obfuscation, request deconstruction, and exploits of the model's extended context window. Additionally, the model's 120,000-character system prompt was leaked to GitHub, exposing internal safety mechanisms.

AnthropicConfirmed to have implemented a "silent degradation" mechanism that secretly reduced model performance when detecting competitive training activity. The company apologized, announcing it would replace covert performance reduction with visible warnings, though this increases false-positive interception of legitimate users.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments