Mistral SDK compromise shows trusted CI can ship malware

The Mini Shai-Hulud attack hit Mistral AI SDK and TanStack packages, exposing a new supply-chain risk around CI, cache poisoning, and OIDC publishing.

AI 요약

What happened: The Mini Shai-Hulud supply-chain attack spread across developer packages including Mistral AI SDK, TanStack, UiPath, and Guardrails AI.
- Mistral listed mistralai==2.4.6 and several npm SDK versions as affected, and the PyPI package ran a malicious payload on Linux import.
Why it matters: This looks less like long-lived npm token theft and more like a CI trust failure combining GitHub Actions cache poisoning with the boundary around OIDC trusted publishing.
The number: TanStack said 84 malicious versions across 42 packages were published between 19:20 and 19:26 UTC on May 11, 2026.
Watch: Teams that installed affected versions need more than a rollback; they should rotate reachable secrets and inspect audit logs from developer machines and CI runners.

On May 12, 2026, Mistral published a quiet but serious security advisory titled TanStack supply chain attack affecting Mistral AI SDK packages. Mistral said a supply-chain attack related to the TanStack incident caused compromised versions of its npm and PyPI SDK packages to be published. The company also said it had no indication that Mistral infrastructure itself was compromised, and that the incident appeared to involve an affected developer device.

At first glance, this looks like another malicious npm and PyPI package story. It is more than that. Mistral AI SDK is one of the dependencies AI teams use when LLM applications and agent runtimes connect to a model provider. TanStack is core infrastructure for large parts of the React and full-stack web ecosystem. UiPath and Guardrails AI appearing in the same wave makes the surface even broader. The target was not only "web packages" or only "AI packages." It was the dependency chain that AI product teams install every day, run in CI, and increasingly let coding agents modify automatically.

The attack path is the more important signal. TanStack's official postmortem does not describe a simple maintainer phishing incident or stolen npm token. The attacker combined a pull_request_target workflow, GitHub Actions cache poisoning, and OIDC token extraction from runner process memory. TanStack said there was no evidence that an npm token was stolen, and no evidence that the npm publish workflow itself was directly compromised. Yet a malicious release still appeared through what looked like a normal CI path.

That distinction is the point. For the last several years, the supply-chain security playbook has been relatively clear: remove long-lived tokens, require 2FA, preserve provenance, and move to OIDC trusted publishing. TanStack was already moving in that direction. But this attack did not simply bypass those defenses. It rode through a workflow shape those defenses trusted. The question is no longer just "who stole the npm token?" It is "why did a trusted release workflow believe a poisoned cache?"

How GitHub Actions cache poisoning and the OIDC trusted publishing boundary connected in the Mini Shai-Hulud attack

What Mistral's advisory says

Mistral tracks the incident as MAI-2026-002, with status listed as "Under investigation." The advisory was published on May 12, 2026. The affected npm packages fall into three groups: @mistralai/mistralai versions 2.2.2, 2.2.3, and 2.2.4; @mistralai/mistralai-azure versions 1.7.1, 1.7.2, and 1.7.3; and @mistralai/mistralai-gcp versions 1.7.1, 1.7.2, and 1.7.3. On PyPI, the affected version was mistralai==2.4.6.

Mistral said the compromised npm packages were removed from the registry and were available only from May 11 22:45 UTC to May 12 01:53 UTC. The compromised PyPI release, mistralai==2.4.6, was uploaded around May 12 00:05 UTC and is now quarantined on PyPI. That exposure window sounds short, but short windows do not make supply-chain attacks small. CI runners can pick up new versions immediately, and package caches, lockfiles, container images, and private mirrors can preserve copies long after a registry removes them.

The PyPI behavior is especially direct. Mistral said the compromised package executes a malicious script on import. On Linux, injected code in src/mistralai/client/init.py downloads https://83.142.209.194/transformers.pyz to /tmp/transformers.pyz and runs it as a detached background process. The name transformers.pyz is also telling. In a machine-learning environment, a name that resembles Hugging Face Transformers may reduce suspicion.

Mistral provided concrete checks. Python users should confirm installed versions with pip show mistralai | grep -i ^version and search dependency files such as requirements*.txt, pyproject.toml, uv.lock, poetry.lock, Pipfile, and Pipfile.lock. On Linux hosts, Mistral lists /tmp/transformers.pyz, a python /tmp/transformers.pyz process, the MISTRAL_INIT=1 environment variable, and outbound traffic to 83.142.209.194 as indicators.

This matters operationally for AI teams. Many organizations treat an LLM SDK as an application dependency. In practice, the same SDK may also be used by an agent runtime, evaluation harness, batch inference job, RAG ingestion pipeline, notebook, CI test job, or demo server. If a team does not know where the affected version was installed, "production is fine" is not enough. A developer laptop or CI runner may have had cloud keys, GitHub tokens, private registry tokens, vector database credentials, or model API keys in reach.

What happened inside TanStack

TanStack's postmortem provides the clearest technical spine of the incident. According to TanStack, between 19:20 and 19:26 UTC on May 11, the attacker published 84 malicious versions across 42 @tanstack/* npm packages. The releases appeared as two versions per package over roughly six minutes. TanStack also said the @tanstack/query*, table*, form*, virtual*, store, and start meta-package families were confirmed clean.

The chain had been prepared days earlier. The attacker forked TanStack Router, renamed the fork to configuration to make it less visible in fork discovery, then opened a pull request titled WIP: simplify history build. The key detail is pull_request_target. That GitHub Actions event runs in the base repository context rather than the fork context. One TanStack workflow checked out the fork PR merge ref and ran a build. During that process, attacker-controlled code executed and stored a poisoned pnpm store in the GitHub Actions cache.

The attacker then force-pushed the PR back to a clean state and closed it. Visibly, the PR became a zero-file no-op. The poisoned cache remained. Later, when an unrelated legitimate PR was merged to main, the release workflow ran and restored the poisoned cache. Malware then executed inside the release workflow, minted an OIDC token through the workflow's id-token: write permission, and POSTed directly to the npm registry. TanStack said this publish did not come from the release workflow's defined "Publish Packages" step. It was performed separately by malware during a run whose tests failed.

The uncomfortable part is that "good security features" and a bad workflow shape existed at the same time. TanStack did not rely on a long-lived npm publish token. npm publishing was bound to GitHub's OIDC trusted-publisher integration. The publish credential is created at release time and expires quickly. That is strong against classic npm token theft. But if the release workflow restores a poisoned cache, and malware running inside that workflow can immediately mint and use the short-lived publish token, the defense changes shape.

TanStack's follow-up post acknowledged this directly. npm provenance, SLSA, OIDC, and 2FA worked as designed, but the workflow shape had the hole. The pattern of using pull_request_target while checking out and executing fork code is an old known-bad pattern that GitHub Security Lab has warned about as "Pwn Request." This incident shows that the old warning still matters in a modern trusted-publishing environment.

Why this is an AI ecosystem incident

Reading the incident only as separate TanStack and Mistral security advisories misses half of the story. Mistral AI SDK is a connection point between AI applications and a model provider. Guardrails AI sits in the validation and policy-enforcement layer for LLM output. UiPath is tied to enterprise automation and agentic orchestration. This wave did not attack an AI model directly. It attacked the developer supply chain that AI software depends on.

AI development environments are especially sensitive to this category of attack. First, they have high secret density. A normal web development environment may already hold GitHub tokens and cloud credentials. AI environments add model provider API keys, vector database tokens, observability keys, dataset storage credentials, and private model endpoint credentials. Second, installs happen frequently. Agent frameworks, evaluation tools, SDKs, model connectors, and notebooks change quickly, and CI often tests a new dependency as soon as it appears. Third, coding agents are increasingly performing package installs on behalf of humans. That automation improves throughput, but it also expands the attack surface when people no longer inspect every install script or lockfile change.

The Mistral PyPI payload running at import time is also more dangerous in AI environments. Many Python jobs import an SDK just to validate initialization. "We never made a real API call" may not matter if from mistralai import Mistral was enough to execute the injected import path. If a unit test in CI imported the affected version, the runner's environment variables and metadata credentials should be treated as in scope.

On the TanStack side, the payload used the install lifecycle. TanStack said npm, pnpm, or yarn installing an affected version would process a malicious optionalDependencies entry that fetched an orphan payload commit and ran a prepare lifecycle script. The resulting obfuscated router_init.js performed credential harvesting, exfiltration, and self-propagation. That structure targets both developer workstations and CI runners. For an AI company building React dashboards, admin consoles, or agent control planes, TanStack dependencies are a realistic route into the organization.

The meaning of "trusted source" is shifting

Provenance remains important in supply-chain security. This incident makes clearer what it does and does not guarantee. Provenance can tell you which workflow produced an artifact. It cannot prove that every input restored and executed inside that workflow was trustworthy. If the workflow restored a poisoned cache and malware inside the workflow minted a token, provenance may make the malicious artifact look more legitimate.

That does not mean provenance is useless. TanStack said OIDC-based publishing helped identify which workflow run produced the publish event. Removing long-lived tokens also limited the attacker's ability to reuse credentials later. Trusted publishing reduced damage and accelerated investigation. But it did not make everything executed inside the release workflow trustworthy.

Security teams need to redraw the boundary. What permissions does a fork PR get when it runs in the base repository context? Which branch scopes can save and restore caches? Should a release workflow trust a build or test cache at all? When are install scripts allowed to run in CI? Which jobs can mint OIDC tokens, and only after which steps? These are not package maintainer questions alone. They are dependency operations questions for every AI product team.

TanStack's immediate hardening steps are useful signals. It disabled the pnpm cache in the release pipeline, removed GitHub Actions caches for affected workflows, pinned all organization actions to commit SHAs, enforced non-SMS 2FA on npm and GitHub, removed pull_request_target, and said it would use GitHub's recommended pattern of sandboxed pull_request jobs feeding artifacts into workflow_run when needed. It also moved to pnpm 11 to use ecosystem install-cooldown behavior. These steps do not magically eliminate supply-chain risk, but they reduce the blast radius of similar attacks.

What development teams should check

Teams that may have been exposed should start with affected versions. For Mistral, search Python lockfiles and package caches for mistralai==2.4.6, and npm lockfiles for affected versions of @mistralai/mistralai, @mistralai/mistralai-azure, and @mistralai/mistralai-gcp. For TanStack, use the official GitHub Security Advisory and TanStack tracking issue as the source of truth for affected versions, then inspect lockfiles, build caches, container images, and private registry mirrors.

The next step is not simply deletion. Mistral's guidance is clear: stop using the affected package version, clean any system where it was installed, rotate all secrets reachable from that system, and inspect cloud audit logs. Listed command-and-control indicators include api[.]masscan[.]cloud, filev2[.]getsession[.]org, git-tanstack[.]com, seed1[.]getsession[.]org, and 83[.]142[.]209[.]194. TanStack likewise advises treating hosts that installed affected versions as potentially compromised.

AI development organizations should add a few extra checks. Model API keys should be treated with the same seriousness as backend production secrets. In an environment where an LLM SDK was compromised, OpenAI, Anthropic, Google, Mistral, Cohere, vector database, and tracing vendor keys may all be exposed. If a coding agent or evaluation runner can install packages, that runner's secret scope should be minimal. Giving test environments the same environment variables as production is a large blast-radius decision in a supply-chain incident.

Package age policies are also becoming practical defenses. TanStack's follow-up mentioned pnpm 11 install-cooldown behavior. Waiting before installing newly published versions will not stop every malicious package, but it buys time in incidents like this one, where outside researchers detected the problem quickly. If a dependency update bot automatically merges and deploys critical packages, a minimum age gate and an extra verification step are reasonable controls.

A supply-chain baseline for coding agents

The longer-term significance connects directly to coding-agent operations. AI agents install packages faster than humans, search for alternative dependencies to resolve errors, and modify lockfiles while fixing failing tests. That ability is useful. But without supply-chain policy, an agent can also pull in a malicious version at attacker speed.

The answer is not just a smarter model. Agentic development needs package firewalls, install-script limits, lockfile diff policy, allowed registries, scoped secrets, network egress control, cache isolation, provenance verification, and static analysis for workflows. GitHub Actions workflows now deserve review with the same seriousness as application code. pull_request_target, unpinned actions, shared caches, broad id-token: write, and install lifecycle scripts in release jobs should all be treated as high-risk surfaces.

The Mistral and TanStack incident points to where supply-chain security is going. Removing long-lived tokens is still correct. Keeping provenance is still correct. Enforcing 2FA is still correct. But those are starting points. Teams also need to examine what trusted CI executes, which caches it restores, when it receives publish authority, and whether a failed workflow can still mint a token.

Mini Shai-Hulud is not only a story about an infected AI SDK. It is evidence that the AI development ecosystem is merging frameworks, SDKs, CI, agents, and cloud credentials into one automated work surface. On that surface, one dependency install is not a small implementation detail. It can become an execution path into model keys, GitHub permissions, cloud accounts, and deployment pipelines.

The lesson is blunt. As AI development speeds up, dependency trust has to become slower and more conservative. As automation expands, secrets have to become narrower. And as teams adopt trusted publishing, they need to distrust the trusted workflow itself more carefully. Supply-chain attackers are moving from stealing tokens to steering the systems that create tokens. That is the real shift AI teams need to track.