Sysdig Traces a 113-Second LLM-Agent Intrusion Into Postgres

Sysdig says an LLM-driven attacker chained a marimo RCE into AWS secrets, SSH bastions, and an internal PostgreSQL dump.

AI 요약

What happened: Sysdig published a case where an LLM agent appears to have shaped post-exploitation steps after CVE-2026-39987.
- The May 10, 2026 chain moved from a marimo notebook into AWS secrets, an SSH bastion, and an internal PostgreSQL dump.
Key number: the bastion-stage schema discovery and table dump took 113 seconds, while the full pivot finished in under an hour.
Why it matters: defenders need to track intent across boundaries, such as credential reads followed by secret retrieval, SSH, and database access.
- marimo should be upgraded to 0.23.0 or later, and exposed instances should trigger rotation of AWS keys, API keys, DB passwords, and SSH keys.

Sysdig Threat Research published a report on May 26, 2026 that is narrower than the usual "AI hacker" headline. The evidence is specific: on May 10, 2026, an attacker compromised an internet-exposed marimo notebook through CVE-2026-39987, then Sysdig observed post-exploitation steps that it attributes to an LLM agent. The chain moved through AWS credentials, Secrets Manager, an SSH bastion, and finally an internal PostgreSQL database.

Reading the incident as "AI replaced the attacker" makes the analysis less useful. The initial entry point was a known pre-auth marimo RCE. The GitHub advisory says the /terminal/ws WebSocket endpoint failed to enforce authentication, letting an unauthenticated attacker obtain a full PTY shell and run arbitrary commands. Affected versions were <= 0.20.4, and the patched version is 0.23.0. Sysdig's new signal is not the exploit payload. It is the shape of the shell activity after compromise, which looked less like a fixed playbook and more like a tool loop adapting to the target's output.

18:23:44 UTC

The first WebSocket connection reaches /terminal/ws on the vulnerable marimo instance.

18:24:14 UTC

Credential harvesting starts against /app/.env*, /etc/environment, /proc/*/environ, and ~/.aws/credentials.

19:26:31-19:26:52 UTC

The stolen access key is used for AWS API calls, and an SSH-key secret is retrieved from Secrets Manager.

19:30:30-19:32:23 UTC

Eight parallel SSH sessions from six Cloudflare Workers IPs dump internal PostgreSQL schemas and tables.

Sysdig's numbers are the anchor. The full chain from marimo compromise to internal PostgreSQL dump took less than one hour. The bastion stage extracted database schema and table contents in less than two minutes. On the AWS side, 12 cloud API calls were spread across 11 distinct IP addresses over 22 seconds. Cloudflare Workers provided the egress pool. That weakens detections built around repeated calls from a single source IP, and it shifts attention toward the behavior of the credential itself.

The LLM-agent attribution does not rest on speed alone. Fast intrusions existed long before LLMs. Sysdig points to four signals appearing together during the 113-second bastion window. First, the attacker inferred table names that looked like an AI workflow database even before confirming the application identity. Second, Chinese planning comments appeared in the command stream. Third, commands were shaped for machine consumption rather than human readability. Fourth, values from earlier output, such as passwords, secret IDs, and SSH key paths, were lifted into later commands.

Each signal has a conventional explanation if viewed alone. A skilled shell operator can use separators, head, and 2>/dev/null. Automation can chain AWS CLI output into later calls. The difference is the cluster. echo '---' makes flat output easier to split. head -N limits table or credential output to a context-friendly size. HEREDOC batches several SQL statements into one psql invocation. 2>/dev/null discards failed probes. Sysdig reads that command shape as optimized for an observation consumer, not for a person manually scanning a terminal.

Sysdig signal	Attacker behavior	Defender signal
Output handoff	Values from `.pgpass` and `ListSecrets` output feed later calls	Credential-file reads followed by cloud secret retrieval or DB access
Machine-shaped commands	`echo '---'`, `head -N`, `HEREDOC`, and `pager=off` appear together	Parsing-friendly batch commands in an interactive shell
Egress fan-out	12 AWS API calls come from 11 Cloudflare Workers IPs	Secret APIs called from many edge IPs in a short window
Goal-seeking dump	Unconfirmed schemas are probed for `api_key`, `credential`, `user`, and `flow`	Intent toward secrets and user tables, not only known TTP sequence matches

The first operational check is marimo versioning. The advisory's patched version is 0.23.0. If an upgrade is not immediate, /terminal/ws should be blocked at the network layer or the terminal feature should be disabled. This applies even when marimo is used as a research notebook rather than a production application. Once a notebook server is exposed to the public internet and its process can see cloud keys, database passwords, or API tokens, the relevant attack surface is not "a notebook." It is a credential-bearing runtime.

The second check is secret rotation. Sysdig's recommendation covers environment variables, .env files, AWS credentials, API keys, database passwords, and SSH keys reachable from a public marimo instance. That list sounds broad because the observed chain was broad. A value read from the initial WebSocket shell led to AWS Secrets Manager. The retrieved SSH key led to a bastion. The bastion's .pgpass led to PostgreSQL. Rotating only one secret closes one hop, not the full pivot path.

The third check is telemetry across internal boundaries. In Sysdig's timeline, the visible edge asset was marimo, but the data loss happened inside PostgreSQL. Between those points sat a Docker host, AWS credentials, Secrets Manager, SSH, .pgpass, and a database. An agentic attacker can read failed commands or unfamiliar schemas and adjust the next probe. If defenders monitor only internet-facing assets and treat internal runtime logs as a separate world, visibility may disappear just as the intrusion becomes damaging.

This incident matters to AI builders because marimo belongs to the same operational category as notebook servers, RAG pipeline builders, agent orchestration UIs, local coding-agent servers, and AI workflow dashboards. Tools such as Langflow, PraisonAI, LiteLLM, ComfyUI, and Ray Dashboard often move from experiment to always-on infrastructure faster than their permission model matures. Developers see them as productivity surfaces. Attackers see runtimes that aggregate cloud credentials, model API keys, database access, source files, and shell capability.

Detection has to move from exact command fingerprints toward intent. Prebuilt playbooks tend to leave stable User-Agents, command ordering, typos, fallback logic, and timing. If an LLM agent reads target output and composes the next command, those fingerprints change from host to host. More durable signals are sequences such as credential-file reads followed by Secrets Manager calls, secret retrieval followed by SSH, or a bastion session followed by schema enumeration and table dumps.

That is harder than adding one rule. Intent-based detection requires logs that can be joined: runtime events, cloud API trails, identity mappings, SSH sessions, and database access logs. The /terminal/ws shell that read ~/.aws/credentials, the access key that later called sts:GetCallerIdentity, and the bastion session that opened four minutes after GetSecretValue need to become one incident timeline. If those events stay isolated, each may look like a low-severity artifact rather than a live pivot.

Cloudflare Workers fan-out adds a policy wrinkle. The incident does not mean Workers is malicious infrastructure by default. The defensive problem is that per-request egress pools weaken simple source-IP correlation. In 22 seconds, Sysdig saw 12 AWS API calls from 11 IP addresses. A burst rule tied to one IP may miss that. A behavior rule tied to the same access key calling ListSecrets and GetSecretValue repeatedly is more likely to survive the egress pattern. For AI-era intrusions, credential behavior can matter more than network origin.

The 113-second number is not mainly a claim that AI is faster than humans. It marks the interval in which schema enumeration, credential-table probing, and multi-table dumping happened through the bastion. The useful detail is adaptation. The chain inferred table names in an unknown internal database, lifted a password from .pgpass, and packed multiple SQL queries into a single call. The operator did not need a hand-written playbook for this exact environment. That changes the economics of post-exploitation.

Sysdig frames the shift as a cost problem more than a capability breakthrough. A skilled attacker can always build target-specific commands by hand, but each new target costs engineering time. An LLM agent changes that cost curve by bringing general priors about application classes and spending inference on the target's actual output. The result does not have to be perfect to matter. If the attacker can cheaply generate tailored follow-up commands for many environments, the scale of opportunistic post-exploitation changes.

There is still a line to keep clear. This report does not prove that all post-exploitation is now agent-driven. Sysdig's conclusion depends on command-stream evidence, planning leakage, output handoff, and egress behavior that external readers cannot fully reproduce from raw telemetry. The practical conclusion is narrower: defenders should expect some intrusions to lose the stable fingerprints of scripted playbooks while keeping the same goals of credential access, privilege movement, and data theft.

For development teams, the immediate checklist is short. Run marimo 0.23.0 or later. Put public notebooks and agent UIs behind authentication, VPNs, allowlists, or reverse proxies. Reduce .env, cloud key, model API key, and database password exposure at the task level rather than the host level. Confirm that shell commands, cloud APIs, Secrets Manager, SSH, and database access can be joined into one incident view.

The security lesson also moves attention from prompt injection to runtime authority. Prompt injection still matters, but the chain Sysdig described was driven by shell access, environment variables, AWS credentials, SSH keys, and PostgreSQL permissions. The better an agent becomes at choosing commands, the more important operating-system permissions, cloud IAM, and secret boundaries become. Model safety does not prevent a /terminal/ws shell from reading an .env file that the process can already access.

Sysdig's report gives AI platform teams a concrete question to answer: what authority does this AI runtime carry? The more useful inventory is not only whether a server calls a model. It is whether the server can read credentials, open shells, reach cloud APIs, access databases, or cross trust boundaries through SSH. Security teams have a matching question: is this session reading secrets and moving into another boundary, even if the command strings do not match a known bad sequence?

The first lesson from this LLM-agent intrusion is not a dramatic forecast. It is an operational risk that already exists. One exposed AI development runtime can become a bridge from a notebook shell to an internal database in under an hour. Teams that run notebooks, agent frameworks, and AI workflow servers need to treat them as privileged infrastructure, not as harmless developer conveniences.