The Safety Paradox: Why AI Companies Are Locking Away Their Most Powerful Defensive Models

In a remarkable show of consensus that rarely happens in the competitive AI space, both Anthropic and OpenAI made the same decision in April 2026: release powerful cybersecurity models—then immediately restrict who can use them.

Anthropc’s Claude Mythos achieved 83.1% on the CyberGym benchmark for autonomous vulnerability discovery, a significant jump from their previous model’s 66.6%. OpenAI followed suit on April 14 with its own restricted-access defensive AI model. Neither company is making these tools publicly available.

What Just Happened

This marks the emergence of a new category: “Restricted-access defensive AI.” It’s a conscious choice by two leading labs that these models are simultaneously too dangerous for public release and too important to keep locked away in internal research.

The reasoning is clear: a model that can autonomously discover zero-day vulnerabilities in real software is an asset both for defensive security teams and—potentially—for sophisticated attackers. The capability to find software weaknesses at scale creates asymmetric risk.

Why This Matters for European Builders

The EU AI Act comes into force in August 2026, bringing mandatory risk assessments and transparency requirements for high-risk AI systems. Cybersecurity applications will likely fall into this category, requiring builders to document safety measures and incident response protocols.

This development raises a practical question for Irish and European security teams: if the most capable defensive AI models are unavailable, how do you evaluate and implement AI-driven vulnerability discovery for your own infrastructure? You may need to rely on API access through restricted programs rather than self-hosted or fine-tuned models.

The Alignment Angle

There’s a deeper signal here about AI safety priorities. Both companies are demonstrating that capability and safety aren’t opposing forces—they’re making a coordinated decision that reflects shared safety judgment.

Simultaneously, Anthropic published research on “Automated Alignment Researchers,” exploring how LLMs can scale oversight mechanisms. OpenAI launched a Safety Fellowship program inviting external researchers to collaborate on alignment work. These moves suggest that restricted access isn’t isolation—it’s paired with increased external engagement on safety research itself.

Practical Implications for Developers

If you’re building security infrastructure:

  • Evaluate existing tools first: Before waiting for broader access to next-gen vulnerability models, understand what’s already available through restricted partnerships.
  • Plan for August 2026 compliance: Document how you’re using AI for security and ensure you can justify decisions under the EU AI Act.
  • Engage with safety research: Programs like OpenAI’s fellowship and Anthropic’s partnerships offer ways to collaborate on defensive applications.

Open Questions

Will restricted access remain the norm for high-capability defensive models, or is this a transitional phase? How will the EU AI Act’s transparency requirements interact with models that are deliberately restricted? And critically: if these models stay locked away, how does the security industry evolve to match the threat landscape?

The consensus between Anthropic and OpenAI suggests this isn’t a temporary strategy. We’re likely seeing a new equilibrium where the most powerful tools come with governance attached.


Source: Anthropic and OpenAI