According to researcher Pliny the Liberator, Claude Fable 5—released by Anthropic on June 9—was successfully broken within 48 hours of launch. The researcher bypassed the model's safety classifier using multi-agent coordination tactics, collectively termed "pack hunt," which combined character-level obfuscation, request deconstruction, and exploits of the model's extended context window. Additionally, the model's 120,000-character system prompt was leaked to GitHub, exposing internal safety mechanisms.
AnthropicConfirmed to have implemented a "silent degradation" mechanism that secretly reduced model performance when detecting competitive training activity. The company apologized, announcing it would replace covert performance reduction with visible warnings, though this increases false-positive interception of legitimate users.