codexui-android hit 29,000 downloads, then stole Codex tokens

The codexui-android npm package stole Codex authentication tokens, pushing AI coding-agent security into install, artifact, and egress controls.

AI 요약

What happened: The codexui-android npm package reportedly read Codex authentication files and sent tokens to an external server.
- Aikido said the malicious code appeared in the npm artifact but not in the GitHub source. TechRadar reported more than 29,000 weekly downloads.
Why it matters: AI coding-agent security now reaches beyond prompt injection into npm install, published artifact review, credential scope, and egress monitoring.
Watch: Removing the package is not enough. Teams need Codex/OpenAI token rotation, session review, install-history checks, and network-log review.

Aikido Community Japan reported on May 28, 2026 that the npm package codexui-android contained code designed to steal OpenAI Codex authentication tokens. The package presented itself as a remote web UI for OpenAI Codex and was connected to a GitHub repository. Aikido's claim focused on the published npm build artifact rather than the repository source: the npm package included code that was not visible in GitHub and, when executed, read Codex authentication files and sent them to an external server.

TechRadar followed on June 1, 2026 with a broader report, saying codexui-android had more than 29,000 weekly downloads. The article quoted Aikido researcher Charlie Eriksen explaining that a stolen refresh token could lead to long-lived account access, Codex session access, API credit abuse, and impersonation across OpenAI services. The target was not a model prompt. It was the developer's local credential file.

Treating this as only another malicious npm package misses the AI-specific risk. Codex, Claude Code, Cursor, Copilot CLI, and similar tools connect package installation, sandbox execution, browser checks, Git operations, local files, and cloud accounts. When a package can reach a developer's refresh token, the damage is not limited to that package's declared function. It can extend into the accounts, repositories, API budgets, and work history available to the agent environment.

Checkpoint	Observed in this incident	Question for development teams
Source vs. artifact	Aikido flagged code present in the npm package but absent from GitHub.	Do you inspect the installed tarball, not just the repository?
Install timing	The report says an Android app ran `pnpm add codexui-android@latest`.	Do you block dynamic `@latest` installs in agent runtimes?
Credential file	`~/.codex/auth.json` or `$CODEX_HOME/auth.json` is the file path to check.	Can the agent runtime read long-lived authentication files directly?
Outbound traffic	Aikido mentioned traffic to `sentry.anyclaw.store`.	Do you retain egress logs for developer machines and sandboxes?

The npm registry metadata also explains why the package could look plausible. codexui-android was created on April 10, 2026, and the registry JSON shows a modification record on May 27, 2026. Version metadata described it as "A lightweight web interface for Codex that runs on top of the Codex app-server" and used keywords such as codex, openai, web-ui, remote, and cli. The name, description, and repository link all matched a pattern a Codex user might expect from an unofficial convenience tool.

Aikido's most useful warning is that a GitHub review alone would not have found the issue. Many teams evaluate a new dependency by looking at the README, stars, issues, visible source files, and package.json. That review fails when the attacker keeps GitHub clean and inserts the payload only into the registry artifact. The practical controls are less glamorous: npm pack --dry-run, lockfile integrity checks, registry tarball diffs, file-list inspection after install, and comparison between Git tags and published package contents.

The installation path also matters. Aikido said an Android app from the same author ran pnpm add codexui-android@latest when it started. That design pulls whatever version is current at execution time. Without a pinned version and a reviewed artifact, the interval between malicious publication and security-database detection can be enough for token theft. Package quarantine and delayed promotion in an internal registry are boring controls, but this incident is exactly the kind of event they are built for.

AI coding agents increase the risk because installation decisions are moving away from explicit human commands. A developer who types pnpm add might at least pause on the package name, maintainer, dependency tree, and README. An agent asked to "add a remote UI," "automate browser verification," or "make this work on my phone" may search, install, test, and present only the final diff or running screen. If the team does not inspect the actual installed tarball and network behavior, a human review gap appears outside the code diff.

The attack path in this case is not a complicated LLM vulnerability. The package executes, reads a Codex credential file, and sends the token out. A security program that spends all its time on prompt injection but ignores package installation and developer-workstation egress can still lose the account through a much simpler route.

Developer or app installs codexui-android

↓

Code from the npm artifact loads and accesses Codex authentication files

↓

Tokens from auth.json are sent to an external server

↓

The attacker may abuse Codex sessions, API credits, and account permissions

OpenAI's own security messaging sits close to this incident. On April 30, 2026, OpenAI announced Advanced Account Security and said ChatGPT accounts can contain sensitive information from Codex and connected tools. The setting applies to ChatGPT and Codex login, requires a passkey or physical security key, disables email and SMS recovery, shortens session lifetime, and provides active session management. OpenAI also said Trusted Access for Cyber participants would need Advanced Account Security starting June 1, 2026.

Those account controls help, but they do not eliminate the runtime problem. Passkeys reduce login phishing. Active session management helps remove suspicious sessions. If code has already executed locally and read auth.json, incident response has to look beyond the account page: install history, shell history, package-manager caches, DNS and HTTP logs, EDR alerts, Codex credential rotation, and the exact machines or sandboxes where the package ran.

OpenAI also said on June 1, 2026, in its AWS announcement that Codex on Amazon Bedrock is a software engineering agent used by more than five million people every week. That scale explains why attackers would target the surrounding ecosystem. As Codex moves from experimental CLI usage into enterprise procurement, AWS security controls, and regular development workflows, unofficial UIs, helper apps, mobile bridges, wrappers, and remote-control tools become attractive distribution points. The attacker does not need to compromise the official product if users can be nudged toward a convenient adjacent tool.

TechRadar also mentioned two Android apps from the same author account. One was OpenClaw Codex Claude AI Agent, which TechRadar said had more than 50,000 downloads and ran the npm package inside a PRoot sandbox. Another was an app named Codex, which the report said had more than 10,000 downloads. Those numbers are reporting claims rather than independently verified registry counts in the research note, but they support the operating pattern: a tool that looks like a convenient AI coding app can pull users into an unofficial runtime and then execute package-manager behavior.

Community discussion around the specific incident was still limited when the Korean article was researched. The research note did not find a large Hacker News or GeekNews thread. Reddit had security-sharing posts with titles such as "OpenAI Codex Authentication Tokens Stolen in codexui-android npm Supply Chain Attack," and another post noted that searching for an OpenAI Codex app surfaced suspicious or fake-looking results. The sample is small, but it points to the user-experience problem: official apps, unofficial wrappers, ads, npm packages, and remote UIs can look similar to a developer trying to move quickly.

The first response is installation discovery. Aikido pointed to ~/.codex/auth.json, $CODEX_HOME/auth.json, sentry.anyclaw.store, codexui-android, and the related Android apps as checks. In npm projects, search pnpm-lock.yaml, package-lock.json, yarn.lock, global install lists, shell history, CI caches, and sandbox images. If an Android app was used, uninstalling the app is only the first step; the Codex/OpenAI account used from that environment needs session and token review.

rg -i "codexui-android|anyclaw|sentry\\.anyclaw" .
npm ls -g --depth=0 | rg -i "codexui"
pnpm list -g --depth 0 | rg -i "codexui"

The second response is credential rotation. Removing the package does not revoke a stolen refresh token. Review active OpenAI sessions, log out anything suspicious, rotate API keys where API keys were present, and verify usage history. Teams should also document where Codex login credentials are stored in their environment. When every developer has a slightly different local setup, incident response slows down at the exact moment the organization needs a deterministic checklist.

The third response is privilege separation for agent runtimes. Running Codex or Claude Code from the same home directory that stores long-lived credentials is convenient, but it enlarges the blast radius. A stronger setup uses a dedicated agent account, short-lived tokens, read-only defaults, task-scoped sandboxes, per-run egress allowlists, and install approval. In particular, @latest installs and lifecycle scripts should pass through a human or policy gate before an agent can execute them.

The fourth response is artifact verification. Repository-based review is weak against this class of attack. Teams need to compare the actual npm tarball against the expected source, inspect file lists and checksums, and verify that a registry publish matches the corresponding Git tag. Internal package proxies can quarantine new versions before promotion. In workflows where AI agents install tools, "unreviewed latest version" should be treated as an exception, not the default.

The fifth response is egress logging. Agents need network access to test web apps, install dependencies, and call APIs, so blocking everything is not workable. The goal is to detect when credential-like files or unusual traffic leave the developer machine, CI runner, or sandbox VM. A domain such as sentry.anyclaw.store can look like telemetry at a glance; telemetry-looking names should not bypass review. DNS and HTTP logs are often the difference between "we removed the package" and "we know whether tokens left the machine."

This incident is separate from the usual model-quality debate. Whether a coding model writes better code, reads a longer context, or uses cheaper tokens does not change the operating fact that agents install and execute code near developer identity. Reviewing only the AI-generated diff is no longer enough. The review boundary now includes installed packages, executed scripts, opened network connections, and credential files readable from the runtime.

As Codex and competing tools move deeper into company workflows, this pattern will probably repeat around unofficial UIs, MCP servers, browser-automation adapters, mobile bridges, remote-control helpers, and local wrappers. Some of those tools will be useful. Some will be malicious. Package reputation, artifact attestation, runtime egress controls, and credential scoping are becoming part of the developer-experience layer, not a separate security afterthought.

For teams using AI coding agents today, the practical checklist is small and concrete: search for codexui-android, rotate Codex/OpenAI credentials, review sessions, block dynamic @latest installs in agent environments, reduce what the agent runtime can read from the home directory, and turn on egress logs for sandboxes. Those controls address this package, but they also form a baseline for the next AI coding-agent supply chain incident.