Devlery
Blog/AI

After 10,000 vulnerabilities, Mythos moves the bottleneck to patching

Anthropic Project Glasswing shows that AI vulnerability discovery is no longer the slowest step. Verification, disclosure, and patch rollout are now the constraint.

After 10,000 vulnerabilities, Mythos moves the bottleneck to patching
AI 요약
  • What happened: Anthropic's first Project Glasswing update says partners found more than 10,000 high or critical vulnerabilities with Claude Mythos Preview.
    • About 50 partners are using Claude Mythos Preview for defensive work, while the model remains outside general release.
  • The bottleneck: Discovery is becoming faster than verification, coordinated disclosure, patch authoring, and deployment.
    • In Anthropic's open-source scan, 75 of 530 reported high or critical issues had been patched, with average patch time around two weeks.
  • Builder impact: Adopting AI security tools is less about adding another scanner and more about redesigning triage queues and release operations.
  • Watch: The figures come from Anthropic's initial update, and many vulnerability details remain limited under coordinated disclosure timelines.

On May 22, 2026, Anthropic published the first public update for Project Glasswing. The headline number is large: Anthropic says roughly 50 partners used Claude Mythos Preview over one month to find more than 10,000 high or critical severity vulnerabilities. Cloudflare is quoted as finding 2,000 bugs in critical-path systems, with 400 of them classified as high or critical. Mozilla had already said that Firefox 150 included fixes for 271 vulnerabilities identified with Mythos Preview.

But the real story is not simply that AI is getting better at hacking. The more important shift is where the bottleneck moves. Security work has long been constrained by the scarcity of people, tools, and time needed to find bugs. If Mythos-class models sharply reduce that discovery cost, the next constraint moves downstream: proving that a finding is real, reproducing it, recalibrating severity, reporting it to maintainers, designing a compatible patch, and waiting until users actually receive the fix. Anthropic reaches the same conclusion. The limiting factor is shifting from how quickly defenders can find vulnerabilities to how quickly they can validate, disclose, and patch the volume that AI systems surface.

Project Glasswing open-source vulnerability dashboard

What Glasswing changes

Project Glasswing is a limited defensive initiative Anthropic announced on April 7, 2026. Anthropic describes it as an effort to secure critical software in the AI era. The initial partner list included Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. More than 40 additional organizations were added later, bringing the May update to roughly 50 partners.

The key detail is that Mythos Preview is not an ordinary model release. Anthropic describes it as a frontier model that is strongest at coding and agentic tasks, but says it has not been released generally because of its cybersecurity capabilities. According to the project page, Glasswing participants can access it through the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. The listed price is $25 per million input tokens and $125 per million output tokens. Anthropic also committed $100 million in usage credits for the research preview.

That structure is different from a normal model launch. A typical launch centers on benchmark tables and API availability. Glasswing does the opposite: it gives strong capability to defenders first, under restricted access, and delays disclosure so vulnerabilities are not immediately handed to attackers. Anthropic points to coordinated vulnerability disclosure practice, where vulnerabilities are usually disclosed 90 days after discovery, or about 45 days after a patch is available if that comes first. That is why the first update contains aggregate statistics and selected examples rather than a catalog of individual bugs.

10,000+
high or critical vulnerabilities found by partners
23,019
estimated vulnerabilities in open-source scans
75
reported high or critical issues already patched

The patch queue appears before discovery slows down

The most interesting part of the update is the open-source scan. Anthropic says it used Mythos Preview over the past few months to scan more than 1,000 open-source projects that underpin the internet and Anthropic's own infrastructure. The model estimated 23,019 total vulnerabilities and marked 6,202 of them as high or critical severity.

Of course, a model saying something is a vulnerability does not make it one. Anthropic therefore evaluated 1,752 high or critical estimated issues through six independent security research firms or internal Anthropic review. The result was strong: 1,587 issues, or 90.6%, were valid true positives, and 1,094 issues, or 62.4%, were confirmed as high or critical. Applying the current post-triage true positive rate, Anthropic says Mythos Preview is likely to surface nearly 3,900 high or critical vulnerabilities in open-source code.

The next numbers matter more. Anthropic has reported 530 high or critical vulnerabilities to maintainers and says it is working to disclose an additional 827 confirmed vulnerabilities as quickly as possible. Of those 530 reported issues, 75 had been patched, and 65 had public advisories. That should not be read only as a poor patch rate. Vulnerability disclosure intentionally gives users time to update, and some fixes can land without an advisory. Still, the larger structure is clear: AI discovery is already moving faster than the maintainer ecosystem can absorb.

Anthropic says a high or critical bug takes about two weeks on average to patch. That is not slow in isolation. At thousands of findings, it is not enough. Many open-source projects are maintained by individuals or small teams working around other jobs, not by full-time security response teams. The same ecosystem is already dealing with low-quality AI-generated bug reports. If real vulnerabilities arrive at similar volume, triage becomes harder, not easier.

What Firefox 150 showed

Mozilla's case makes the Glasswing story less theoretical. On April 21, 2026, Bobby Holley of Mozilla published "The zero-days are numbered", explaining that the Firefox team had been using frontier AI models since February to identify and fix potential security vulnerabilities. Work with Opus 4.6 led to 22 security-sensitive fixes in Firefox 148. Applying an early Mythos Preview build to Firefox later resulted in 271 vulnerability fixes included in Firefox 150.

That example matters for two reasons. First, AI vulnerability discovery moved beyond a demo and into shipped patches. Second, even a security-serious team like Mozilla had to face the operational question of whether it could keep up with the volume. Holley framed the moment optimistically as a chance for defenders to win, but that optimism rests on persistent operational work: de-prioritizing other things, validating findings, landing fixes, and getting them released.

Community reaction focused on the same pressure point. In a Reddit discussion on r/cybersecurity, some readers treated the 271 Firefox fixes as meaningful evidence that the defensive shift is real. Others warned against over-reading model claims. One longer thread argued that the real security bottleneck is not just vulnerability knowledge, but time to fix, ability to replace legacy systems, organizational incentives, and human-centered attacks such as phishing. That critique is useful. Mythos finding more bugs does not automatically erase security debt. It makes hidden debt visible faster.

Disclosure policy is under pressure too

Vulnerability disclosure has always involved a balance. Defenders and maintainers need time to patch. Users need time to update. Attackers can weaponize details once they are public. Coordinated vulnerability disclosure exists to preserve that delay. Anthropic limiting individual details in the first Glasswing update fits that practice.

The problem is that Mythos-class systems strain the old balance. In the past, discovery was hard and expensive enough that delayed disclosure bought defenders meaningful time. If similar models spread more widely and attackers can search for the same bugs cheaply, the fact that a vulnerability has not yet been publicly disclosed may no longer be enough. Another model could independently find the same issue before the disclosure window closes.

Anthropic is direct on this point. It says models with cybersecurity skills similar to Mythos Preview are likely to be developed across several AI companies, and that no company currently has safeguards strong enough to prevent misuse of such models. That is the stated reason Mythos-class capabilities are not generally available yet, and also the reason Anthropic is trying to give core defenders an asymmetric advantage through Project Glasswing.

The logic is convincing and risky at the same time. Giving defenders a head start makes sense. But if access to powerful models is concentrated among large enterprises, cloud providers, security vendors, and governments, smaller open-source projects and mid-sized companies may receive protection later. The Linux Foundation framing on the Glasswing project page points at exactly that issue. Open-source maintainers support critical infrastructure, but many do not have large security teams. If Glasswing is truly a defender-first strategy, the next test is not just model access. It is how far verification labor, patching support, and maintainer capacity can be extended.

For engineering teams, this is operations, not another scanner

For builders, the question raised by Glasswing is bigger than whether to turn on a Claude security product. AI security tools create more results. More results create more decisions. To make that useful, teams need an internal queue for security reports, a severity recalibration process, false-positive handling, clear code ownership, release-blocking rules, and customer notification criteria.

Imagine an AI system finds 300 high severity candidates in a production codebase. The team cannot fix all of them immediately. It has to decide which issues are reachable from the internet, which are plausibly exploitable, which sit behind feature flags, which require internal network assumptions, which affect versions that are actually deployed, and which already have compensating controls. Without that judgment layer, an AI scanner becomes an alert flood.

Patch rollout is a second constraint. A pull request that fixes a vulnerability is not the end. For a library, there may need to be a release, advisory, and downstream update path. For a product, there may need to be staged rollout, rollback planning, customer communication, and log monitoring. In the agentic security era, the operational loop that proves the patch reduced real risk may matter more than the model that suggested the code change.

StageWhat AI can accelerateRemaining bottleneck
DiscoveryCode exploration, vulnerable-pattern inference, exploit candidate generationScan scope and threat model definition
ValidationReproduction draft, proof of concept, severity rationaleActual exploitability and blast-radius judgment
PatchFix proposals, tests, regression-risk explanationOwner approval, compatibility, release timing
DisclosureAdvisory draft and impact explanation90-day or 45-day coordination, downstream rollout, customer response

The gray zone between open and restricted models

Anthropic is also bringing some defensive capability into public products. The update says Claude Security public beta is available to Claude Enterprise customers, and that Claude Opus 4.7 was used over three weeks to patch more than 2,100 vulnerabilities. Anthropic also launched a Cyber Verification Program that gives some security professionals doing legitimate vulnerability research, penetration testing, and red-team work a path to fewer cyber-safeguard restrictions. Skills used with Mythos Preview, a codebase mapping harness, scanning subagents, and a threat model builder are also available on request to qualified customer security teams.

That points to where AI security products are going. The product is not just a model. It is a work harness: mapping a codebase, building a threat model, dispatching scanning subagents, drafting reports and patches, and routing everything back through human triage. Just as coding agents have moved from IDE autocomplete into repository-level work, security agents are moving beyond SAST result summaries into disclosure workflows.

The gray zone grows with that shift. Which request is defensive research, and which is attack preparation? The same exploit reproduction capability can serve both sides. Anthropic says current safeguards are not enough, while also operating a restricted-access model and a verified researcher program. That is a practical short-term compromise. Over time, access rights for security researchers, auditability, national regulation, and cloud distribution channels are likely to become larger disputes.

The real signal in this update

Glasswing raises the baseline for AI security. "Can AI find vulnerabilities?" is no longer the best question. At least some frontier models are already entering the phase where they can find, reproduce, and suggest patches for real bugs. The better questions come next. Who validates the findings? Who reports them to maintainers? Who ships the patches? Who writes advisories? Who checks whether users actually updated?

Engineering organizations can draw a sober lesson from this. First, code ownership and security ownership need to be explicit. If AI increases findings, ambiguous ownership becomes an unattended critical queue. Second, vulnerability response SLAs need to be revisited for a world where high severity candidates arrive in larger batches. Treating every AI-raised issue the same way will not scale. Third, attitudes toward open-source dependencies need to change. Public code can be read by defender models and attacker models alike. SBOM quality, dependency freshness, and patched-version rollout are not compliance decoration. They become operating metrics for agentic exploit risk.

Anthropic's numbers come from the first public update, and most individual vulnerability details remain undisclosed. This should not be read as proof that every company can reproduce the same performance today. The direction, however, is clear. As vulnerability discovery becomes less scarce, security value moves down into the operational layer. In a world where AI can find zero-days faster, the contest may be decided less by model benchmarks and more by the throughput of the patch queue.