Devlery
Blog/AI

Anthropic Expands Project Glasswing to 150 Organizations as Mythos Turns Vulnerability Discovery Into a Patch Bottleneck

Anthropic expanded Project Glasswing to about 150 new organizations. The hard part is shifting from Claude Mythos findings to validated disclosures, patches, and deployments.

Anthropic Expands Project Glasswing to 150 Organizations as Mythos Turns Vulnerability Discovery Into a Patch Bottleneck
AI 요약
  • What happened: Anthropic expanded Project Glasswing on June 2, 2026, adding about 150 new organizations.
    • The new cohort spans 15+ countries and includes power, water, healthcare, communications, hardware, core software vendors, and nonprofit maintainers.
  • The numbers: Anthropic's CVD dashboard showed 23,019 findings, 1,900 candidates, and a 90.8% true-positive rate among reviewed candidates.
  • Builder impact: The bottleneck for AI security models is moving from vulnerability discovery to triage, coordinated disclosure, patching, and deployment evidence.
    • Anthropic expects other companies may have Mythos-class models within 6-12 months.
  • Watch: Mythos Preview is not a generally available product; it remains gated to partners that meet Anthropic's security requirements.

Anthropic announced on June 2, 2026, that it is expanding Project Glasswing to about 150 new organizations. The initial partner group had roughly 50 organizations. The new cohort is based across more than 15 countries and includes power, water, healthcare, communications, hardware, core software vendors, and nonprofit maintenance organizations. Each organization must pass Anthropic's security requirements before receiving access to Claude Mythos Preview.

The headline is not just the number of partners. Anthropic said many of the new partners operate codebases where a successful attack could affect more than 100 million people. The company also said early Project Glasswing partners have already used Mythos Preview to find more than 10,000 high- or critical-severity security flaws. When Anthropic introduced Project Glasswing on April 7, the framing was "give powerful cyber models to defenders first." The June 2 expansion shifts the practical question to what happens after the model finds the bug.

Project Glasswing

Project Glasswing started around Claude Mythos Preview, an unreleased frontier model. The initial announcement named Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks among the participants. Anthropic said Mythos Preview had found high-impact vulnerabilities in major operating systems and web browsers, including a 27-year-old OpenBSD vulnerability, a 16-year-old FFmpeg vulnerability, and a Linux kernel privilege-escalation chain.

This model is not open to ordinary developers. In the April announcement, Anthropic said it had no plan to make Claude Mythos Preview generally available. The longer-term goal is to provide Mythos-class capabilities at scale for defensive use once misuse safeguards are in place. The June 2 expansion keeps that boundary. Access is broader, but new organizations still need to meet security requirements, and Anthropic said future expansion will prioritize critical infrastructure providers, critical open-source maintainers, and safety testers.

Anthropic's timeline is explicit. The company expects that other AI companies may have Mythos-class models within 6 to 12 months, and that some may release them without adequate misuse safeguards. That sentence should be read less as model-race theater and more as release-policy pressure. If vulnerability discovery becomes cheaper and faster, attackers and defenders converge on the same capability window. Defenders need more than model access; they need an operating system for deduplicating findings, validating reports, coordinating with maintainers, and proving that patches reached users.

150
new Glasswing organizations
15+
countries in the new cohort
100M+
people potentially affected by major attacks

That operating view is clearer in Anthropic's coordinated vulnerability disclosure dashboard. The dashboard was marked as current through May 22, 2026 at 10:27 PT. Starting in February, Anthropic used an early snapshot of Claude Mythos Preview to search for vulnerabilities in open-source software, then worked with external security research firms on triage, validation, and maintainer reporting. The page reads less like a model benchmark and more like a production queue for vulnerability handling.

CVD stageMetricOperational meaning
Discovered23,019 findingsThe raw candidate pool produced by the model.
Candidates1,900 findingsItems organized for external security-firm review.
Reviewed1,726 findingsHumans and security firms checked reproduction, severity, and report quality.
Confirmed valid90.8% true positivesThe share of reviewed candidates judged to be real vulnerabilities.
Reported467 + 1,129 findingsReports sent after external review or directly by Anthropic at maintainer request.

For engineering teams, the more important gap is between 23,019 findings and the 467 reports after external review, not the 90.8% true-positive rate by itself. The dashboard notes that "true positive" is only a proxy for impact. Until a maintainer receives a report and decides how to fix it, the real patch outcome is still unknown. Anthropic describes patched vulnerabilities as the more reliable metric, but patches are a lagging indicator. Between "this is a real bug" and "users are protected" sit advisories, releases, downstream packages, deployment policy, and rollback planning.

Anthropic's May 22 initial update described the same bottleneck in prose. The company said it had scanned more than 1,000 open-source projects and that as discovery volume increased, triage and disclosure capacity became the constraint. The June 2 expansion makes that constraint larger. If more partners scan more codebases with Mythos Preview, more maintainers and vendors receive more reports, and more security teams need a repeatable path from candidate finding to verified fix.

The benchmark numbers from the April launch explain why this pressure is arriving now. Anthropic said Mythos Preview scored 83.1% on CyberGym vulnerability reproduction, compared with 66.6% for Opus 4.6. For agentic coding, it reported 77.8% on SWE-bench Pro, 82.0% on Terminal-Bench 2.0, and 93.9% on SWE-bench Verified. The Terminal-Bench setup used the Terminus-2 harness, adaptive thinking at maximum effort, a 1 million token budget per task, and an average across five attempts. Those details matter because a cyber agent that can spend long reasoning budgets also creates long verification and operating-cost tails.

The Red Team's Claude Mythos Preview technical article shows that the model is not a glorified lint scanner. Its Linux kernel exploit discussion walks through a dangling pointer, slab page reuse, an AF_PACKET receive ring, CONFIG_HARDENED_USERCOPY, KASLR bypass work, and a kernel stack read. That capability is useful to defenders, but the same reasoning pattern is useful to attackers. Anthropic is keeping the preview closed because one high-quality model output can become a proof-of-concept exploit chain.

The initial partner quotes point to the same access-control problem from different angles. Cisco framed AI capability as changing the urgency of critical infrastructure defense. AWS said it was applying Claude Mythos Preview to critical codebases. Microsoft cited improvements on the CTI-REALM open-source security benchmark compared with previous models. The Linux Foundation highlighted that many open-source maintainers have historically operated without dedicated security teams. The common constraint is not model availability alone. It is report quality, maintainer capacity, and customer deployment.

The new industries named on June 2 make the patch problem less abstract. Power, water, healthcare, and communications systems do not patch like consumer web apps. Hospital systems have downtime constraints. Water utilities and grid operators often mix vendor firmware with legacy systems. Hardware vendors have to coordinate silicon, firmware, drivers, and management software. A Mythos Preview finding does not instantly repair any of those systems. Fix validation, rollout planning, rollback planning, and customer communication become part of the security product.

This is also where teams should separate Claude Security from Mythos Preview. In the June 2 announcement, Anthropic pointed to Claude Security as a broader defender access path using public frontier models. Claude Security is a public beta for Claude Enterprise customers that scans codebases and suggests patches. Mythos Preview is the gated Glasswing program. Treating both as a single "Anthropic security model" would blur access rights, model capability, logging, data boundaries, and abuse controls.

The first practical lesson for developers is report format. If an AI scanner produces a vulnerability candidate, the report needs affected versions, reproduction steps, exploitability evidence, severity rationale, duplicate status, a patch candidate, and a regression test. Without that information, model output becomes noise in a maintainer queue instead of a work item. Anthropic's use of external security research firms in the CVD pipeline is a sign that report quality is part of the system, not a clerical afterthought.

The second lesson is deduplication. Anthropic's launch materials and dashboard both emphasize coordinated vulnerability disclosure. If several AI security products scan the same open-source project, maintainers may receive overlapping reports for the same root cause. The same pattern appears inside companies when code owners, AppSec teams, platform teams, and vendor security teams triage the same finding separately. Without structured deduplication keys, hashes, affected files, dataflow paths, sink/source descriptions, and exploit preconditions, 23,019 findings turn into ticket backlog before they turn into security improvement.

The third lesson is patch ownership. Anthropic said on June 2 that support is moving beyond finding vulnerabilities toward disclosing, fixing, and deploying patched software. That order matters. Even when an AI model suggests a patch, maintainers still need to evaluate compatibility, performance, ABI stability, backwards behavior, and customer support. If a security team merges a fix but downstream users do not update, the risk remains. The CVD dashboard also makes clear that "patched upstream" does not automatically mean widely installed.

The fourth lesson is model-access governance. Glasswing organizations receive access only after meeting Anthropic's security requirements. Enterprise security teams need the same internal questions before exposing cyber-capable models: who can use the model, where prompts and outputs are stored, and which repositories or ticket systems can contain exploit details. Before a report reaches an external maintainer, the team also needs a review path, an embargo policy, and closed collaboration channels for sensitive issue handling.

Community reactions show why access control is also political. Reddit discussions in r/cybersecurity and r/Anthropic framed Glasswing as useful for open-source maintainers and critical infrastructure defenders. Other threads, including r/eutech discussions around European access, raised concerns that agencies such as ENISA could depend on a U.S. AI company for access to cyber models. Even if the June 2 expansion spans more than 15 countries, access still sits inside Anthropic's partner-screening structure.

The implications extend beyond organizations directly invited into Glasswing. Any company selling software, firmware, cloud services, or AI agents into U.S. critical infrastructure customers may eventually receive Glasswing-style vulnerability reports. Customers will ask whether the finding has been reproduced, when a patch will ship, whether a CVE or GHSA will be issued, and whether downstream packages have been updated. Security vendors will need to explain triage service-level agreements, false-positive handling, and maintainer workflow integration instead of quoting only model discovery counts.

Teams building AI agent products should read Mythos as part of the wider agentic tooling story. Anthropic describes Mythos Preview's cyber capability as emerging from agentic coding and reasoning strength. The same capability family powers code generation, terminal work, browser automation, and repository search. If a product can call a shell, browser, package manager, or cloud console, saying "we are not a security tool" will not be enough. Permissions, sandboxing, audit logs, output redaction, and abuse monitoring become product requirements.

The operational conclusion is simple: in a market where AI finds vulnerabilities faster, the patch system becomes the product. Anthropic is widening access to about 150 new organizations, but its own CVD dashboard does not hide the manual work after discovery. If the path from 23,019 findings to maintainers, upstream patches, downstream releases, and installed updates remains narrow, frontier cyber models will help security teams while also expanding maintainer queues.

The next metric to watch is not only Mythos Preview's next benchmark score. Watch the throughput of disclosure and patch handling. Anthropic's future security requirements for Glasswing expansion, updates to the CVD dashboard's severity and patched-upstream counts, and Claude Security's false-positive rate in enterprise workflows will matter more to builders than another model leaderboard entry. Project Glasswing is news because it asks whether the software ecosystem can absorb the vulnerability discovery speed that AI is starting to produce.