Devlery
Blog/AI

After 10,000 Bugs, Mythos Moves the Bottleneck to Patching

Anthropic Project Glasswing suggests AI security models may shift the bottleneck from vulnerability discovery to verification, disclosure, and patch throughput.

After 10,000 Bugs, Mythos Moves the Bottleneck to Patching
AI 요약
  • What happened: Anthropic published the first-month update for Project Glasswing, saying Mythos Preview helped partners find more than 10,000 high- or critical-severity vulnerabilities.
    • In open source scans, Anthropic reported 6,202 high/critical candidates and a 90.6% true-positive rate in the independently reviewed sample.
  • The new bottleneck: Discovery is getting faster than verification, reporting, patching, and deployment.
    • Anthropic says a high- or critical-severity issue found by Mythos Preview takes an average of two weeks to patch.
  • Builder impact: Security automation is becoming a workflow problem, not just a model-call problem: triage queues, maintainer capacity, disclosure policy, and baseline defenses now matter more.
  • Watch: Mythos-class models are still not generally available, and Anthropic says safeguards are not yet strong enough to prevent serious misuse at scale.

Anthropic's May 22, 2026 initial update on Project Glasswing is more interesting than the familiar claim that AI can find vulnerabilities. The sharper question is what happens when discovery starts outrunning patching. For years, one of the expensive parts of software security was finding the next serious bug. If a cyber-specialized frontier model such as Mythos Preview dramatically lowers that cost, the bottleneck moves to human verification, maintainers, coordinated disclosure, release engineering, and customer rollout.

The headline numbers are aggressive. Anthropic says Project Glasswing partners found more than 10,000 critical- or high-severity vulnerabilities in the first month. Cloudflare is quoted as finding 2,000 bugs in critical-path systems, including 400 rated high or critical. Mozilla used Mythos Preview around Firefox 150 and found and fixed 271 vulnerabilities, which Anthropic says is more than 10 times the count found with Claude Opus 4.6 during Firefox 148.

But the real story is not just the "10,000" figure. It is the throughput behind it. Anthropic scanned more than 1,000 open source projects and produced 23,019 vulnerability candidates. It estimated 6,202 of those as high or critical. Then 1,752 high/critical candidates were reviewed by independent security researchers or Anthropic itself, and 1,587 were validated as true positives, a 90.6% rate in that sample. Of those, 1,094, or 62.4%, were confirmed as high or critical severity.

That sounds like defenders finally gained speed. The next numbers are more sobering. As of May 22, 2026, Anthropic's public dashboard showed 1,596 vulnerabilities reported to 281 open source projects, 97 patched, and 88 assigned a CVE or GHSA. The oss-sec mailing list also highlighted that Anthropic was using "disclosed" to mean reported to maintainers, not publicly disclosed. Findings can pile up quickly; moving them into a publicly actionable and patched state is slower.

Anthropic Project Glasswing vulnerability disclosure dashboard

Security teams already know this shape. Scanners have long produced large volumes of alerts. The hard work is reproducing the issue, re-rating severity, deduplicating reports, assigning ownership, passing regression tests, and shipping releases. Mythos Preview is notable because Anthropic's sample was not presented as a pure false-positive flood. It showed a high true-positive rate after review. Yet more true positives can still make the rest of the security operation jam faster.

10K+
high/critical vulnerabilities found by partners
6,202
open source high/critical candidates
90.6%
true positives in reviewed sample
2 weeks
average high/critical patch time

This changes how security automation should be judged. Older vulnerability management tools were often evaluated by what they could detect. The next question is how quickly a toolchain can turn a finding into a trustworthy patch. Anthropic says a high- or critical-severity vulnerability found by Mythos Preview takes an average of two weeks to patch. That metric says as much about organizational throughput as it says about model performance. For under-resourced open source maintainers, two weeks can be optimistic. For internal enterprise code with clear ownership and a hardened deployment pipeline, it may be too slow.

Project Glasswing also matters because it is explicitly framed as a defense project. Anthropic has not made Mythos-class capabilities generally available. The stated reason is direct: the company says neither Anthropic nor any other organization currently has safeguards strong enough to prevent models like this from being misused to cause serious harm. Glasswing therefore gives limited access to selected partners, points the model at critical infrastructure first, and waits for patches to propagate before releasing vulnerability details.

That design tries to avoid colliding with normal disclosure practice. Anthropic says new vulnerabilities are generally disclosed within 90 days of discovery, or roughly 45 days after a patch is available when that happens first. That also explains why the update avoids exploit-level detail. The faster a company boasts about newly discovered vulnerabilities, the more it risks handing attackers a map. This update is less a full technical report than a carefully bounded document about what the model can do and what Anthropic is willing to say in public.

Mozilla's account gives the most concrete view of this boundary. In its Mozilla Hacks post, the Firefox team describes Mythos Preview as part of a pipeline, not as a chat window sprinkled over a codebase. The 271 Firefox 150 vulnerabilities were the result of a system that combined codebase analysis, harnesses, a second verification model, human review, and the normal release process. That distinction matters. AI did not magically fix a browser by itself. A security team embedded AI in a repeatable workflow, and discovery volume rose sharply.

Cloudflare's response points in the same direction. Cloudflare said Mythos Preview could link multiple weak signals into more serious exploit paths instead of stopping at isolated low-severity findings. That is important in security research. Many vulnerabilities are not born from one obviously catastrophic line. They emerge when boundary conditions, permission assumptions, missing validation, and old code paths connect. If a model is good at seeing those connections, teams may need to re-open parts of the backlog that previously looked low priority.

This is where the story needs restraint. "AI replaces all security researchers" is not the right conclusion. The recurring nouns in Anthropic's update are partners, testers, maintainers, and security research firms. Mythos Preview's outputs were still reviewed by independent security researchers and humans. Anthropic also said some maintainers asked the company to slow the rate of disclosure. Open source maintainers are already exhausted by low-quality AI-generated bug reports. If stronger models begin producing many more real vulnerabilities, the ability to separate useful reports from junk becomes even more valuable.

Community reaction landed on the same operational point. In a Reddit r/technology thread, some commenters joked that AI is now finding bugs created by the industry's ship-and-fix culture. Others described the practical slog of validating hundreds of scan results, grouping common-code fixes, and separating them from service-specific patches. Another line of reaction was that vulnerability management may shift from pre-release assurance toward faster detection and patch loops. That is more useful than generic panic. When models find more bugs, organizations have to absorb more truth.

For developers and AI teams, the first practical shift is that a security backlog means something different. In the past, "nobody has found it yet" functioned as a quiet buffer. That buffer is shrinking as both public and restricted models can scan more code at lower cost. Popular open source projects, internet-facing libraries, browsers, cloud infrastructure, authentication systems, and cryptography-adjacent code are natural early targets. Delaying dependency updates becomes more expensive when discovery itself becomes cheaper.

The second shift is that security automation becomes an operating system, not a warning panel in an IDE or CI job. Anthropic has opened Claude Security in public beta for Claude Enterprise customers, and says Claude Opus 4.7 was used to patch more than 2,100 vulnerabilities over three weeks. It also plans to provide qualified security teams with the skills, harnesses, and threat-model builder used in Mythos work. That stack is not "one model." It is an agentic security workflow: threat model, code mapping, scanning subagent, triage report, and patch proposal.

The third shift is evaluation. Cisco's Foundry Security Spec, introduced alongside Project CodeGuard, is an attempt to make agentic AI security assessment more structured. If models are going to find vulnerabilities, their results need to be reproducible and auditable. "The model said this is dangerous" is not enough to merge a patch. A team needs the threat model, the risky code path, reproduction evidence, and tests that prevent the issue from coming back.

This connects directly to AI coding agents. Much of the recent coding-agent conversation has focused on productivity, PR generation, long-running tasks, and browser control. Glasswing is the other side of that system. If agents write code faster, security agents need to challenge that code faster too. Development pipelines are likely to contain two classes of agents: one that builds features and another that probes whether those features and dependencies are attackable. If those loops are unbalanced, productivity gains can return as vulnerability-processing debt.

So what can teams do now? First, classify high-risk dependencies and internet-exposed paths instead of throwing every finding into one queue. AI-driven discovery volume will overwhelm undifferentiated triage. Second, connect vulnerability reports to code ownership, tests, and release authority. Even a well-written model report creates only queue length if ownership is unclear. Third, treat coordinated disclosure and public advisory handling as part of product operations. Anthropic's dashboard makes the distinction visible: reported to maintainers, patched, and assigned a CVE or GHSA are different states.

Fourth, do not underestimate baseline defenses. Anthropic points to hardened network defaults, multi-factor authentication, and comprehensive logging as preparation for a world where Mythos-class capabilities spread. Those controls sound ordinary, but they become more important when serious vulnerabilities are found in bulk. If every bug cannot be patched immediately, teams need to detect whether attackers used the gap, reduce account takeover risk, and limit lateral movement.

The final mistake would be reading Glasswing as either "defenders won" or "attackers are finished." The more accurate conclusion is that the economics of vulnerability discovery are changing. Lower discovery costs help defenders, but they also help attackers. That is why Anthropic is keeping Mythos-class access restricted and routing it through defensive partners first. The company also acknowledges that models with similar capability may soon appear from multiple AI labs. That means defenders do not have infinite time to prepare.

Project Glasswing's real news value is not simply how smart Mythos Preview is. It is where the project shines a light on the software industry's bottleneck. Security teams used to struggle with insufficient discovery and too many false positives. They may now have to manage more true positives, more disclosure states, more patch requests, and more maintainer fatigue. The next competitive edge in AI security is not only finding vulnerabilities. It is converting those vulnerabilities into trustworthy patches and deployed updates fast enough for the discovery curve.