Major AI Safety Commitments Watered Down as Industry Races Ahead of Risk Mitigation
Anthropic drops its flagship safety guarantee amid growing concerns that AI development is outpacing safety measures, raising questions for EU regulators.
Anthropic Abandons Safety Guarantee, Signalling Industry-Wide Shift in Risk Approach
In a significant reversal, Anthropic has dropped its flagship 2023 commitment from its Responsible Scaling Policy (RSP) to never train or release AI models unless it could guarantee advance safety measures. The decision marks a critical moment for the AI industry’s approach to responsible development—and carries direct implications for how the EU AI Act will be enforced.
What Happened
Anthropicannounced it cannot rule out that powerful new models from 2025 could facilitate bio-terrorist attacks. More significantly, the company has acknowledged that the science of AI evaluations proved “more complicated than expected,” leading it to abandon the hard safety commitments that differentiated it from competitors.
This development comes alongside broader industry moves: OpenAI, Anthropic, Meta, Google DeepMind, and Microsoft jointly called for Congressional safeguards on synthetic RNA and DNA acquisition. Yet stress-testing by safety experts revealed that publicly available models—including ChatGPT, Claude, and Gemini—remain capable of generating detailed biological weapons information.
Why This Matters for Europe
The EU AI Act, which enters its enforcement phase in 2026-2027, rests on the assumption that AI developers can meaningfully assess and mitigate risks before deployment. Anthropic’s retreat from pre-release safety guarantees suggests this assumption may be fundamentally flawed.
Irish and European regulators relying on the Act’s risk-assessment requirements now face a clear signal: leading AI labs believe they cannot reliably predict or prevent misuse at scale. This creates a regulatory credibility gap that policymakers must address urgently.
What This Means in Practice
For AI builders and organisations deploying frontier models:
- Assume safety features remain incomplete. Don’t rely solely on model-level safeguards for sensitive applications.
- Implement application-layer controls. Rate-limiting, access controls, and monitoring become essential.
- Expect regulatory tightening. The EU and others will likely impose stricter pre-release requirements as confidence in industry self-governance erodes.
For European tech companies, this creates both risk and opportunity. Tighter regulation may raise development costs, but it could position EU-aligned developers as trustworthy partners for sensitive sectors.
Open Questions
- How will EU regulators respond? Will the EU AI Act’s enforcement mechanisms adapt to acknowledge that pre-release assessment has limitations?
- What does “good enough” safety look like? If guarantees are impossible, what risk threshold is acceptable for deployment?
- Are multi-agent systems the next frontier risk? Emerging research shows that aligning individual AI agents doesn’t guarantee system-level safety when they interact—a challenge barely addressed in current policy.
The research community’s findings from METR’s pilot exercises (involving Anthropic, Google, Meta, and OpenAI) suggest coordination failures between AI systems may become a key risk vector. This goes largely unaddressed in current EU regulations.
The Path Forward
Anthropic’s decision doesn’t mean the company is abandoning safety work—it means the industry is acknowledging uncertainty more honestly. That’s valuable for calibrating realistic expectations. But it also means regulators and deployers can’t assume that safety is “solved” before a model leaves the lab. The real work happens in deployment, monitoring, and continuous adaptation.
For Irish and European stakeholders, this is a moment to strengthen governance frameworks around post-release safety and to build the monitoring infrastructure the AI Act assumes exists.
Source: Anthropic News