Devlery
Blog/AI

The 90-day review stalled, and AI model launches changed

Trump’s delayed AI executive order shows frontier model launches being reshaped around speed, security evaluation, and critical infrastructure readiness.

The 90-day review stalled, and AI model launches changed
AI 요약
  • What happened: President Trump delayed signing an AI and cybersecurity executive order shortly before the scheduled White House event.
    • AP, Axios, and TechCrunch reported that the draft included a framework for government review before frontier model releases.
  • Key number: The contested pre-release model access window was reported as somewhere between 14 days and 90 days.
  • Builder impact: A powerful model launch is becoming a security release, not just an API release, with red-teaming, evals, customer notice, and infrastructure readiness in the critical path.
  • Watch: The draft was not signed, and the reported process was voluntary, so treating it as a licensing regime goes beyond the evidence.

An American AI executive order stopped at the edge of signature. On May 21, 2026, President Donald Trump was expected to sign a new order covering AI and cybersecurity, then postponed it before the White House event. AP reported that Trump was concerned the order could slow the United States' AI technology edge. TechCrunch reported that he told reporters he disliked some of the language and did not want to get in the way while the U.S. was leading.

This can look like a Washington scheduling story. For AI developers and product teams, the deeper question is more operational: before a frontier model is released, who should evaluate it, how early should they see it, and against which risks? The reported draft would have created a voluntary framework for AI companies to share powerful models with the government before release, so agencies could assess cybersecurity and national security risk. Axios described a 90-day pre-release access concept, while TechCrunch, citing CNN reporting, framed the disputed window as 14 to 90 days.

Those numbers are not final rules. The executive order was not signed, and the draft is not an official public regulation. Even so, the 14-to-90-day range shows where the release calendar for frontier AI is moving. A strong model launch is less likely to mean "publish the model card and API docs when ready." It is more likely to involve red-team work, eval harnesses, critical infrastructure playbooks, government and customer notice, and constrained previews before broad availability.

The strange number: 90 days before launch

Ninety days is a familiar number in software security. Coordinated vulnerability disclosure often uses a 90-day window: the finder gives a vendor time to patch and communicate before public disclosure. The 90 days reported in the AI executive order draft looks similar, but the direction is different. This is not about notifying a vendor before disclosing a known vulnerability. It is about evaluating the capabilities of an unreleased model before the public can use it.

That distinction matters. A frontier model is not a single bug. Reviewers need to understand which cyber tasks it can perform, how reliably it can use tools, which jailbreaks still work, how it might assist critical infrastructure workflows, and how it could be abused. The issue is not a single patch. It is a capability envelope and a deployment plan.

May 1, 2026
NIST CAISI published its DeepSeek V4 Pro evaluation, including private benchmarks and agentic cyber/software engineering tests.
May 20, 2026
Axios reported on the draft executive order's 90-day pre-release access framework.
May 21, 2026
President Trump postponed the signing event and said draft language could interfere with U.S. AI competitiveness.

The policy debate sits behind NIST CAISI's DeepSeek V4 Pro evaluation. CAISI evaluated DeepSeek V4 Pro with public benchmarks and internal assessments, including PortBench for software engineering, CTF-Archive-Diamond for cyber tasks, and the ARC-AGI-2 semi-private dataset. CAISI described DeepSeek V4 Pro as the strongest Chinese model it had evaluated, while estimating it remained about eight months behind U.S. frontier models.

The message for builders is direct. Government evaluation is not just a checklist, and it can produce a different picture than the public benchmarks a model provider chooses to highlight. A lab can say its model is safer than the prior generation, while a government or independent evaluator may use a different scaffold, token budget, hidden task set, or tool environment and reach a different risk assessment.

The narrow line between voluntary review and a gate

The easy mistake is to overstate the draft's legal force. Based on the reporting available, the proposal was a voluntary framework. Summaries of the unsigned draft shared from Politico reporting emphasized that it did not create mandatory licensing or formal preclearance. In other words, the claim "a model cannot launch without government permission" runs ahead of the facts.

Voluntary systems can still become powerful in the market. If major AI companies participate in pre-release government evaluation, and critical infrastructure customers begin asking whether a model has gone through it, the process becomes a practical launch checkpoint even without a hard legal requirement. For cloud, banking, insurance, healthcare, energy, and defense customers, "not legally mandatory, but required by our risk committee" may be the sentence that matters.

IssueReported draft directionPractical question for teams
Pre-release accessDiscussion of pre-release model access between 14 and 90 daysWhen does launch freeze start, and which eval artifacts are submitted?
Review modelA voluntary review framework, according to reportingCould voluntary review become an enterprise purchase condition?
Cyber riskONCD and related agencies would assess security riskHow do teams prove cyber evals, tool-use limits, and abuse monitoring before release?
Infrastructure readinessPotential pre-release access for critical infrastructure providersHow do banks, power operators, and telecom customers test model changes early?

As a software release pattern, this looks like a mix of public beta, security embargo, and enterprise private preview. A model provider may need to expose an unreleased model to selected evaluators and customers. Evaluators need to inspect failure modes, not only benchmark scores. Critical infrastructure operators need time to test how a new model changes phishing risk, vulnerability discovery, fraud automation, operational decision support, and incident workflows.

All of that slows releases down. So Trump's reported concern about a "blocker" was not only political language. AI model competition depends on fast launches and fast learning loops. If a model has to be frozen 90 days before release, a competitor can ship during that window. But if a model launches without review and later shows serious cyber capability, enterprise customers and government agencies may demand stronger controls. Speed and trust are now billing each other.

Washington after Mythos

The timing of the executive order debate also matters. Anthropic's Mythos line, OpenAI's cyber-specialized model work, CAISI's DeepSeek evaluation, and agreements between major AI companies and government evaluators are all part of the same arc. Powerful models are no longer interpreted only as smarter chatbots. They can assist vulnerability discovery, exploit writing, phishing automation, malware analysis, defensive automation, and incident response. That makes them dual-use by default.

Government and developer perspectives start from different questions. Government agencies ask about national security and critical infrastructure risk. Developers ask about API availability, model versioning, preview access, eval cost, customer commitments, and safety policy. Those questions eventually meet. To say a model is production-ready, a provider increasingly has to show not only performance, but also risk evaluation and response planning.

This shift lands differently for closed and open models. A closed API provider can operate access control, rate limits, abuse monitoring, and customer gating directly. That makes it easier to attach government review and private preview programs. An open-weight model is harder to recall once released. Pre-release evaluation may matter more, while post-release control is weaker.

That does not mean open-weight models are automatically more dangerous. Transparent weights, independent research, internal deployment, and regional data boundaries are real strengths of open models. The issue is release process. Any model with strong cyber or tool-use capability needs a design for pre-release evals and post-release monitoring. The executive order fight asks how much of that design national policy should require.

The release checklist changes for product teams

This story matters even if your company does not train frontier models. Product teams that consume frontier models through APIs should stop treating model releases as ordinary dependency upgrades. A new model brings better capability and new failure modes. It may call tools more aggressively, plan over longer horizons, provide more useful clues for security-related requests, or interpret existing guardrails differently.

First, eval artifacts become product artifacts. Teams need more than a benchmark scoreboard. They need records of which task sets validated which model versions, which prompts and tool schemas were used, what counted as failure, and how regressions are prevented. Whether or not a government pre-release process arrives, enterprise customers are likely to ask for this material.

Second, preview access becomes more important. If a model provider opens a new model to major customers and evaluators before public release, product teams need to run their critical workflows during that window. Payments, healthcare, security, infrastructure changes, and legal documents are not good places to switch models on public launch day without rehearsal.

Third, the model router becomes a policy engine. In the previous generation, routing often meant "this model is cheaper and faster." Now it may mean "this workflow may only use a pre-reviewed model," "this region blocks this model version," or "this action requires a human-approved model path." AI infrastructure teams should treat routing as compliance and blast-radius management, not only cost optimization.

Fourth, customer communication becomes part of the release. If customers include banks, hospitals, government agencies, or security teams, "we attached a new model" is not enough. Teams need to explain which capabilities changed, which risks were evaluated, which use cases are restricted, and how rollback works if something breaks. The model launch calendar becomes a trust calendar.

The delay is the start of negotiation, not the end

The executive order stalled, but the argument did not disappear. Reporting from AP and Axios points to a growing tension inside government and industry: the U.S. wants to preserve speed in the AI race with China, while also recognizing that powerful models can affect cyber operations and critical infrastructure. Those goals do not reconcile easily.

Policy could settle on several compromises. Instead of a fixed 90-day window, review periods could vary by capability tier. Pre-release evaluation could focus only on risk areas such as cyber, bio, autonomous replication, or critical infrastructure tool use. The government could certify evaluators instead of receiving models directly, request only summary results, or encourage protected previews for certain customer groups. If industry resistance is strong, the order could become a much softer information-sharing and voluntary best-practice framework.

For developers, the specific compromise matters less than the direction of travel. Frontier AI models are being treated as part of social infrastructure, not standalone products. When models write code, find vulnerabilities, interpret documents, support financial workflows, and assist critical infrastructure operators, pre-release validation and post-release traceability become difficult to avoid.

So the core of this story is not "Trump blocked regulation" or "AI safety lost." More precisely, it revealed a new operating reality around model launches. The 90-day pre-release review may have stalled, but the model release calendar already has new columns: performance, price, latency, pre-release eval, government access, critical infrastructure readiness, and customer trust.

AI teams should start filling those columns now. When the next frontier model arrives, the question will not stop at "how smart is it?" Who saw it first? Which evaluations did it pass? Which customers received early access? Which capabilities were delayed because of risk? And if something goes wrong after release, how quickly can the team roll back? The 2026 model race is likely to become more intense on those questions, not less.