CLAUDE.md Became an Attack Surface, TrapDoor's Invisible Instructions

TrapDoor combines malicious npm, PyPI, and Crates.io packages with poisoned AI coding instruction files.

AI 요약

What happened: Socket disclosed the TrapDoor campaign, covering 34+ malicious packages and 384+ related versions and artifacts.
- Credential stealers spread across npm, PyPI, and Crates.io. Socket's earliest observed package was eth-security-auditor, uploaded on May 22, 2026 at 20:20:18 UTC.
New target: The attackers hid Unicode instructions inside .cursorrules and CLAUDE.md, aiming at AI coding tools.
Why it matters: Agent instruction files now need to be reviewed like privileged execution policy, not casual README material.
- AGENTS.md, MCP configs, hooks, and skill files belong in the same trust boundary.
Watch: Socket does not claim the AI injection works consistently across every tool, but the campaign shows attackers are already operationalizing this surface.

Supply chain attacks usually start with a package name. A dependency looks close to a popular library, runs a postinstall hook, fetches a remote payload on import, or hides inside a build script. TrapDoor, disclosed in late May 2026, fits that familiar pattern at first glance. The Socket Research Team analyzed a malicious package campaign spanning npm, PyPI, and Crates.io, and said the attackers targeted crypto, DeFi, Solana, AI, and security developers.

But the important part of TrapDoor is not only the package count. The campaign did not stop at stealing developer secrets. It also targeted the instruction files read by AI coding tools. Cursor's .cursorrules and Claude Code's CLAUDE.md are files teams use to tell agents how a project works. They look like documentation in a repository, but in practice they behave more like policy files that can alter an agent's behavior. TrapDoor went after that gap. The attackers used hard-to-see zero-width Unicode characters to hide instructions and tried to steer assistants into running a "security scan" style secret discovery workflow.

The numbers are already large enough. Socket said the campaign involved more than 34 malicious packages and more than 384 related versions and artifacts. The earliest Socket-observed package was PyPI's eth-security-auditor@0.1.0, uploaded on May 22, 2026 at 20:20:18 UTC. Packages then appeared across npm, PyPI, and Crates.io with names such as wallet-security-checker, llm-context-compressor, sui-sdk-build-utils, and prompt-engineering-toolkit, all plausible names for a developer browsing registries in a hurry.

This is uncomfortable news for AI builders because coding agents are no longer "suggestion panes." They read and write files, run shells, invoke package managers, manipulate Git, call MCP tools, and automate browsers. If an attacker can poison the files those agents treat as project instructions, the later stage of a supply chain attack can borrow the agent's execution privileges rather than relying only on a one-time install script.

Socket's TrapDoor attacker playbook screenshot

The second target matters more than 34 packages

Socket's primary analysis classifies TrapDoor as an active crypto stealer supply chain attack. The malicious packages were designed to collect SSH keys, Sui/Solana/Aptos wallet data, AWS credentials, GitHub tokens, browser profiles, browser login databases, crypto wallet extension data, environment variables, API keys, and local development configuration files. Seen from that angle alone, this is a broad developer workstation credential theft campaign.

The execution path varied by ecosystem. npm packages used postinstall hooks to run a shared payload called trap-core.js. Socket described it as a 1,149-line credential harvester and propagation tool. It validates AWS and GitHub credentials, attempts lateral movement with SSH keys, and plants persistence through Git hooks, shell hooks, systemd, cron, and SSH.

The PyPI packages looked like Python packages, but on import they fetched JavaScript from attacker-controlled GitHub Pages and executed it with node -e. That design lets the attacker alter the remote payload without publishing a new PyPI release. On the Rust side, Crates.io packages used build.rs. Rust build scripts run automatically during compilation, so code execution can happen before the user calls any library API. Socket said Sui and Move-themed packages looked for local keystores, encrypted data with the hardcoded XOR key cargo-build-helper-2026, and exfiltrated it through GitHub Gists.

Up to this point, TrapDoor is a multi-registry credential stealer. The AI-specific part starts in the next layer. Some npm packages used .cursorrules and CLAUDE.md as persistence vectors. The attackers inserted zero-width Unicode characters into these files so the hidden instructions would be difficult for a human reviewer to spot in an editor. The intent was to make an AI assistant read the file as project policy and perform a workflow disguised as security scanning or dependency verification. In practice, that workflow could become secret discovery and exfiltration.

That difference is significant. In a traditional supply chain incident, package installation is close to the end of the execution path. The install script runs, steals credentials, and the immediate payload finishes. TrapDoor points toward something that survives the install: agent context. The package can execute a one-time payload and also alter a file future AI coding sessions will read. The attacker is not only reaching for the developer's shell. It is trying to whisper into the next agent session that receives a request like "fix this project."

Malicious package install, import, or build

↓

Credential scan, token validation, wallet/key exfiltration

↓

.cursorrules, CLAUDE.md, Git hook, shell hook persistence

↓

Risk that the next AI coding session reads hidden instructions as project policy

Source-backed reconstruction from Socket's TrapDoor analysis of npm, PyPI, Crates.io execution paths and AI injection behavior

It is not a README, it is execution policy

Instruction files for AI coding tools sit in an awkward category. On GitHub, they look like documentation. Reviewers often skim CLAUDE.md the way they skim README or CONTRIBUTING: test commands, component locations, lint rules, commit practices. But from the agent's perspective, the file is loaded as context before work begins. It can shape tool usage, code style, verification commands, and prohibited behavior.

TrapDoor exploits that semantic mismatch. To a person, the file is prose. To a model, it is an instruction stream. If zero-width Unicode is involved, the problem becomes worse: the content may be invisible to a human reviewer but still present to tokenization and model input. A file that appears empty or harmless in a normal diff can carry a different instruction to the assistant. Socket noted that GitHub showed hidden or bidirectional Unicode warnings on some PRs, which is a sign this attack surface is already colliding with code review UX.

Socket also reported that the attacker account ddjidd564 moved beyond registry publication and opened PRs against open source projects. Targets included browser-use/browser-use, langchain-ai/langchain, langflow-ai/langflow, run-llama/llama_index, FoundationAgents/MetaGPT, and OpenHands/OpenHands. PR titles looked ordinary, such as "docs: add .cursorrules with dev standards and build verification." Some included the campaign marker P-2024-001 and pointed to attacker-controlled configuration URLs.

That is the evolutionary signal. The attacker did not simply upload malicious packages and wait for installs. It also tested whether project files for AI development tools could enter normal contribution workflows. If such a file is merged, developers and agents opening the repository later may naturally accept the poisoned instruction as local policy. The attack surface expands beyond registries, package managers, and CI into code review and agent context loading.

The Hacker News made the same point in its follow-up coverage: TrapDoor appears to test whether AI-related project files can be introduced through ordinary open source PR workflows, not only through package publication. There was not yet a large independent Hacker News or Reddit debate at the time of the Korean article's research note, but security community sharing pointed in one direction. .cursorrules, CLAUDE.md, and AGENTS.md need to be treated like executable configuration, not passive docs.

Detection was fast, but the window was open

Socket disclosed detection timing as well. Across 381 package-version records with complete timestamps, the average time to detection was 5 minutes and 56 seconds, with a median of 5 minutes and 27 seconds. The fastest detection happened 58 seconds after publication. On paper, that is impressive. For package lifecycle scripts, however, minutes are enough. If a developer laptop or CI runner installs the package and the payload executes immediately, a registry takedown may arrive after the host has already been touched.

That is the hard part of supply chain defense. "The malicious package was removed" and "the machine that installed it is safe" are different claims. TrapDoor's target list makes the distinction obvious. Stolen SSH keys can enable lateral movement. Stolen GitHub tokens can expose repositories, packages, and CI settings. Stolen AWS credentials can put cloud resources and secret stores at risk. Browser login databases and wallet keystores can affect personal accounts and crypto assets.

In AI coding environments, the blast radius can grow further. Many developers let agents open local shells. Agents install dependencies, run tests, format code, create branches, and update files. Because the agent is "handling it," the human may watch fewer intermediate commands. At that point, a malicious dependency does not need to trick the model. The package manager lifecycle executes with the host's permissions.

The second window is more subtle. Even if immediate credential theft is detected, poisoned instruction files can remain. CLAUDE.md and .cursorrules often live at the repository root and may be committed or left in a local checkout. If the developer removes the bad dependency but does not inspect these files, the next agent session may still read the hidden instruction. Incident response that stops at dependency removal is therefore incomplete.

AGENTS.md is in scope too

The reports focus mainly on Cursor and Claude Code files, but the lesson is broader. Codex-style AGENTS.md, Gemini CLI configuration, MCP server configuration, tool permission files, skill or plugin manifests, and editor extension settings all belong in the same category when they are read by an agent and change behavior. If a model loads a file automatically and treats it as trusted context, attackers can target that file.

Technically, not every file carries the same risk. Some are style guides. Some connect shell hooks or MCP commands. Some tools read instruction files only as context, while others integrate hooks, skills, plugins, or command allowlists. Operationally, however, they are all "trusted input to AI." In traditional security, trusted configuration files are part of the attack surface. In agentic development, trusted natural-language configuration becomes part of the attack surface too.

The recent arXiv paper How Agentic AI Coding Assistants Become the Attacker's Shell frames the same issue from a research perspective. Agentic coding assistants rely on external artifacts, and hidden instructions can hijack the assistant into executing unauthorized commands. TrapDoor is not the same as a controlled lab demonstration, but it shows what that threat model can look like inside a real supply chain campaign.

There is also a boundary to keep clear. Socket does not claim TrapDoor's AI injection technique works reliably across every tool and model. It may be inconsistent. But for defenders, the bar is not "the technique always succeeds." The important signal is that attackers are already trying to productize this surface. They combined zero-width Unicode, benign-looking PR titles, a security scan disguise, and attacker-controlled configuration URLs. That is an early operational signal, not a theoretical curiosity.

What code review should look for

First, AI instruction files need CODEOWNERS. .cursorrules, CLAUDE.md, AGENTS.md, .claude/settings.json, MCP configs, tool permission files, and custom command directories should not be reviewed like ordinary docs. Changes to those files should require security or platform-owner review. External contributor PRs that add or modify them deserve a higher risk tier than README edits.

Second, Unicode anomaly scanning should become standard. Zero-width characters, bidirectional control characters, and unusual invisible separators are dangerous in source code, but they are even more dangerous in agent instructions. GitHub UI warnings may help, yet organizations need policy of their own. Pre-commit hooks, CI, and repository scanners can block unapproved U+200B, U+200C, U+200D, U+FEFF, and bidi control characters.

Third, remote URLs inside instruction files deserve special handling. TrapDoor's PR examples referenced attacker-controlled GitHub Pages URLs as if they were configuration sources. If an agent instruction file asks the assistant to read an external URL, fetch remote scripts or configs, or run a "security scan" command, that is an execution path. It should be blocked or escalated, especially when combined with curl | sh, node -e, python -c, package installation, token validation, or environment scanning.

Fourth, the files an agent reads must be separated from the commands it can execute. A good project instruction can say "run tests with pnpm test." The harness should still apply approval and risk policy before executing commands. pnpm test and env | curl are not equivalent. git status and git push are not equivalent. Natural-language instructions can be persuasive, but the tool layer should enforce separate policy.

Fifth, incident response needs agent-context cleanup. On a host suspected of installing a malicious package, responders should inspect dependency caches, shell hooks, Git hooks, cron, and systemd entries. They should also inspect .cursorrules, CLAUDE.md, AGENTS.md, .claude/, .cursor/, MCP settings, and local skill or plugin directories. Credential rotation is necessary, but it does not remove a persistent instruction that a future agent session may read.

Supply chain security and agent security are merging

TrapDoor makes the competitive landscape more interesting. Socket, Snyk, Aikido, JFrog, and GitHub Advanced Security are strengthening dependency intelligence and install-time blocking. OpenAI, Anthropic, Cursor, GitHub, and Google are making coding agents more capable. Until recently those markets looked partly separate: package scanners on one side, developer productivity tools on the other.

They now face the same problem. When an agent installs dependencies, the package scanner becomes part of the agent runtime. When an agent reads CLAUDE.md or .cursorrules, repository policy becomes model-context security. When a developer adds an MCP server, package provenance, tool permission, network egress, and credential scope all connect. The more work AI development tools automate, the more supply chain boundaries they cross on behalf of the user.

The defense therefore has to be layered. At the registry layer, teams need malicious package detection and, in some environments, delayed or reviewed installs. At the CI layer, they need controls around untrusted code execution, cache boundaries, and OIDC token minting paths. At the workstation layer, they need package lifecycle script restrictions, secret isolation, and network egress controls. At the agent layer, they need instruction file review, hidden Unicode detection, tool approval, and audit logs. No single layer covers the whole path.

Individual developers have a practical takeaway too. When opening a new project, inspect AI instruction files before running an agent. Cloning an unfamiliar repository and immediately giving a coding agent broad permission is risky. Before package install, review the lockfile and lifecycle scripts. Before broad agent access, narrow the working directory and use a shell without production credentials. The larger the job you delegate to an agent, the more important it is to ask what the tool can read and execute.

Invisible instructions are not a security boundary

TrapDoor leaves one uncomfortable lesson. The convenience features that make AI coding tools useful are useful to attackers too. Project-specific instruction files help agents work better. They also let attackers hide instructions where agents are likely to listen. Automating repeated commands makes testing and building faster. It also makes malicious package lifecycle execution faster.

That is not an argument to stop using agents. It is the opposite. Because agents are now inside real development workflows, the files that configure them and the permissions that empower them need stricter treatment. CLAUDE.md can be a team's memory. .cursorrules can be a productive runbook. AGENTS.md can be a working contract between humans and agents. If so, these files should not be treated as lighter than README.

If this incident is framed only as "34 malicious packages," the durable lesson gets lost. The number 34 may change. The 384 versions and artifacts may shrink as registries and security vendors act. The lasting shift is that attackers are beginning to treat the instruction layer of AI coding assistants as part of the supply chain. Future malicious dependencies may steal credentials and prepare the agent's next move at the same time.

The questions for AI development teams are now concrete. Which files does the model read automatically in our repositories? Who approves those files? Do we detect invisible Unicode? Do we review external URLs and command suggestions? After the agent reads instructions, which tools can it execute without approval? If those questions do not have clear answers, one invisible line of instruction may outlive the malicious package that planted it.