Claude Fable 5 isn’t nerfed; safety router is overly cautious
After July 1, BridgeBench debugging score for Claude Fable 5 fell from 86.2 to 25.9. BridgeMind reported a safety classifier routed many coding prompts to Opus 4.8; Arena.ai found outputs largely unchanged.
Anthropic reinstated Claude Fable 5 on July 1 and deployed a new safety classifier. BridgeMind re-ran its full coding suite against the July 1 endpoint and recorded steep declines in category scores: debugging fell from 86.2 to 25.9, refactoring dropped from 73.6 to 38.4, and hallucination resistance declined from 75.9 to 61.7. BridgeMind counted fallbacks as failures because many intercepted prompts were answered by Claude Opus 4.8 rather than Fable 5. BridgeMind posted: “FABLE 5 CAME BACK NERFED.” Of 12 TypeScript debugging tasks in the run, three reached Fable 5 and nine were rerouted and scored as zeros.
Anthropic introduced the classifier after external security researchers demonstrated a jailbreak that coaxed the model into identifying and demonstrating software vulnerabilities. The classifier was trained to block that technique and has a conservative bias that triggered on routine debugging and code-fix prompts, causing many security-adjacent queries to be routed away from Fable 5.
Arena.ai used thousands of blind human-preference votes across text, vision, document, code and agent tasks and ranked models with an Elo-style system. Its before-and-after comparison showed smaller changes: frontend code Elo moved from 1650 to 1623, document performance rose by 34 points, expert text rose by 25 points, creative writing rose by 9 points, while coding fell by 18 points and hard prompts fell by 3. Arena.ai reported that when Fable 5 actually answered a prompt, its outputs were comparable to the pre-reinstatement version.
The two evaluations measure different things. BridgeMind focuses on security-related debugging and treats any answer not produced by the evaluated model as a failure. Arena.ai measures perceived output quality when the model responds, regardless of which model produced the answer.
Observed routing patterns affected users unevenly. Prompts that included security-related terms such as “vulnerability”, “exploit” or “fix”, or tasks resembling vulnerability analysis, were frequently routed to Opus 4.8. Tasks for creative writing, document summarization, research and expert-level text more often reached Fable 5.
Anthropic acknowledged the classifier’s conservative behavior and said it plans to refine the system, but it has not provided a timeline for adjustments. The classifier was deployed after the U.S. government and researchers raised national security concerns about the demonstrated jailbreak technique.
Users posted negative reactions on social platforms describing the reinstated endpoint as “nerfed” or “broken.” Benchmarking data from BridgeMind and Arena.ai show different outcomes: one shows lower scores driven by routing to an older model, the other shows similar output quality when Fable 5 is the responding model.
The material on GNcrypto is intended solely for informational use and must not be regarded as financial advice. We make every effort to keep the content accurate and current, but we cannot warrant its precision, completeness, or reliability. GNcrypto does not take responsibility for any mistakes, omissions, or financial losses resulting from reliance on this information. Any actions you take based on this content are done at your own risk. Always conduct independent research and seek guidance from a qualified specialist. For further details, please review our Terms, Privacy Policy and Disclaimers.







