Devlery
Blog/AI

Codex Keeps Working On A Locked Mac

OpenAI Codex adds Appshots, Goal mode, browser annotations, and locked computer use, pushing coding agents toward longer-running local workflows.

Codex Keeps Working On A Locked Mac
AI 요약
  • What happened: OpenAI bundled Appshots, Goal mode, browser annotations, and locked computer use into a May 21 Codex update.
    • The official release note frames the update around richer context, long-running objectives, browser improvements, and remote use after a Mac locks.
  • Why it matters: The center of coding-agent competition is moving from model answers toward a working runtime that joins screens, goals, browsers, and local permissions.
  • Watch: Screen capture and locked use are convenience features, but they also widen the boundary around sensitive app state and user approval.
    • OpenAI's own Computer Use guidance tells users to keep tasks narrow and personally review permission prompts and sensitive flows.

OpenAI published a cluster of Codex updates in the ChatGPT release notes on May 21, 2026. At the surface, the list reads like a product changelog: Appshots, Goal mode, in-app browser annotations, locked computer use, and browser use improvements. But treating this as a set of small convenience features misses the larger movement. Codex is shifting from "a model that helps fix code" toward "a local agent runtime that keeps receiving the user's working state, holds a goal across a long task, and uses browsers and desktop apps as verification surfaces."

The interesting part is what the announcement does not lead with. There is no new model name at the center and no benchmark table. OpenAI's own release-note framing is "richer context, goal mode, browser improvements, and remote locked use." More context. Longer objectives. Better browser handling. Limited remote work after the machine locks. As AI coding tools move deeper into the developer's day, the bottleneck is no longer only whether the model predicts the next line well. It is how the user hands over the error they are looking at, how the agent decides when to stop, how it validates a rendered screen, and what authority it has after the user steps away.

This article is not a usage guide for the new Codex features. The news is the direction of travel. The coding-agent market is moving from model calls toward working surfaces. OpenAI is stretching that surface across ChatGPT, the Codex app, the CLI, IDE extensions, mobile remote connections, browser tooling, and Computer Use. The May 21 update shows that these are no longer experimental extras. They are becoming core axes of competition for Codex.

What Changed

In the official release note, OpenAI says Codex now has more ways to understand work context and keep moving through longer tasks. The first new piece is Appshots. In the macOS Codex app, users can send the foreground app window into a Codex thread. The attachment can include not only a screenshot, but also accessible text exposed by the app. The default gesture is pressing both Command keys, or a custom shortcut chosen by the user. Instead of describing the error dialog, design tool, document, or settings screen they were just looking at, the user can pass it as one attached context object.

The second piece is Goal mode general availability. It works in the Codex app, IDE extension, and CLI through /goal. In Goal mode, the goal text is both the starting prompt and the completion criterion. Codex uses that objective to choose next actions and judge whether the task is finished. OpenAI's examples include migrating a JavaScript codebase to TypeScript strict mode or reducing homepage time to interactive below one second. The point is not a one-shot prompt. It is a long-running task with an explicit definition of done.

The third piece is the in-app browser and browser use. Codex's in-app browser lets the user and Codex look at the same rendered page, leave visual comments, and turn those comments into code changes. OpenAI describes it as a fit for local development servers, file-backed previews, and public pages that do not require login. Browser use goes one step further: Codex can click, type, take screenshots, extract assets, and perform read-only JavaScript inspection to check the page state.

The fourth piece is locked computer use. The name sounds sensitive, and it should be read carefully. OpenAI does not present it as a general remote-unlock mechanism. The documentation says it must be explicitly enabled, applies only during an active trusted Computer Use turn, and is limited to a short scoped authorization window. Codex can continue a Computer Use task behind a locked Mac, but it covers the display, relocks when local input is detected, and cannot approve administrator authentication or security permission prompts. In other words, this is not "Codex can freely operate your sleeping Mac." It is a permission mechanism that lets a remotely supervised Codex task continue under narrow conditions when the machine reaches the lock screen.

Official Codex in-app browser documentation image

Context Delivery, Not Just Prompting

Appshots matters because the hard part is not only attaching screenshots faster. In daily AI coding work, many failures do not come from the model having no knowledge of the codebase. They happen because the state the user is seeing never made it into the prompt. "The button looks wrong" is weaker than the actual rendered button at the failing viewport, with the nearby elements, spacing, color, and overflow visible. Terminal errors have a similar problem. Copy one log line and the surrounding state disappears. Paste the whole log and the prompt becomes noisy.

Appshots is an attempt to reduce that gap. OpenAI's documentation says the foreground window image and available text can be captured together, and that an appshot behaves like a Codex attachment. The examples include sharing an API reference page so Codex can write a script, sharing an email or calendar view so Codex can draft next steps, or showing an image editor and preview window so Codex can modify related assets or code.

This matches a broader direction in AI developer tools. Figma MCP tries to turn the design surface into an agent tool. Browser automation for agents lets the runtime observe page state instead of only reading source files. GitHub Copilot, Cursor, and other coding agents keep pulling more editor state, pull-request context, terminal logs, and background-agent status into the product. Codex Appshots is one of the more direct attempts to make the local Mac app itself a describable working context.

More context also means a wider exposure boundary. OpenAI's documentation is explicit that an appshot shares a captured image and accessible text with Codex. It also notes that for some apps and websites, including Google Docs, Gmail, Google Sheets, and Google Slides, only the visible screenshot may be included rather than the full document text. If a development team standardizes this feature, "which app windows may be captured" becomes an operational rule. A screen shown to a coding agent is no longer just a screen. It is prompt input and shared data.

/goal Puts Completion Criteria Into The Product

Goal mode addresses the long-running-agent problem more directly. A normal coding prompt starts as a single request: fix this bug, add tests, make this component responsive. Real work immediately splits into steps. The agent reads files, forms a plan, edits code, runs tests, debugs failures, and revises. The user may ask for an explanation or redirect the work halfway through. When the agent does not have a durable definition of done, long tasks become fragile.

OpenAI describes Goal mode as a persistent objective. The objective stays active across multiple steps, and Codex uses it to judge progress. In the Codex app, progress plus pause, resume, edit, and clear controls appear above the composer. The upside is obvious when the target is verifiable. If the goal is "compile in TypeScript strict mode without explicit any," Codex can keep iterating toward that check instead of stopping after a superficial edit.

The risk sits on the other side of the same mechanism. A poorly written goal can make the agent faithfully pursue the wrong thing. "Improve performance" does not specify which metric matters. "Make the tests pass" can push user experience, security, or maintainability into the background. Goal mode becoming generally available means users can delegate longer tasks more naturally, but it also means goal writing becomes part of the product skill.

This is where Codex starts to look more like Claude Code, Cursor, GitHub Copilot coding agent, and Google Antigravity. All of these products still talk about model quality, but the practical product differences increasingly come from planning, execution, verification, review, approval, interruption, and resumption. If a coding agent is only better autocomplete, it does not need a goal. If it is going to read a repository, run a test suite, and edit several files over tens of minutes, goals and completion criteria become core UI.

FeatureWhat Codex receivesBoundary teams must define
AppshotsForeground app-window imagery and accessible textAllowed apps, customer data, and internal-document scope
Goal modePersistent objective and completion criteria for long tasksSuccess metrics, stop conditions, and review standards
In-app browserRendered pages, visual comments, and read-only inspectionLogin-free pages, test accounts, and annotation scope
Locked useLimited Computer Use progress behind a locked MacAllowed apps, sensitive tasks, and remote-approval policy

When The Browser Becomes A Verification Device

The browser-related changes in this update are not minor. OpenAI describes the in-app browser as a place where the user and Codex see the same rendered page and can leave visual comments. The user can enter annotation mode, select an element or region, and leave a comment. Styling feedback is more precise, with previews for values such as font, text, spacing, and color.

For frontend work, that is more than a nice interface. One weak point for coding agents is the result that looks plausible in code and wrong on screen. Tests can pass while a button label overflows, a tooltip hides data, or a card breaks on mobile. A human sees the screenshot and notices immediately. An agent reading only files can miss the failure. In-app browser annotations turn what the user sees into specific work instructions.

Browser use is the next step. Codex can open a local dev server, a file-backed preview, or a public page, then click, type, inspect, and capture screenshots. The documentation draws a clear line: signed-in pages, regular browser profiles, cookies, extensions, and existing tabs are not supported. It also tells users to treat page content as untrusted context and avoid pasting secrets into browser flows. That limitation matters. Browser agents bring prompt-injection risk, malicious-page risk, and account-misuse risk along with their convenience.

For development teams, this feature intersects with test strategy. If a project already has Playwright, Storybook, visual regression checks, or Lighthouse budgets, Codex browser use can become a human-readable verification loop on top of those systems. If the project has almost no visual testing, the agent may open the browser but still lack a standard for correctness. To decide whether "this page is right," the agent needs routes, states, expected layout, and forbidden behavior. Browser agents are not magic for teams without verification culture. They are another way of observing the work.

A Locked Mac Sits At The Crossroads Of Convenience And Permission

Locked computer use is the most sensitive feature in the bundle. The idea that Codex can continue Computer Use work after a Mac locks is genuinely useful. A developer might be supervising from mobile when a laptop locks. A GUI task might still need a desktop app, iOS simulator, internal browser flow, or design tool that cannot be exercised from the CLI alone. In those cases, stopping at the lock screen is frustrating.

OpenAI's documentation puts several constraints around the feature. The user must enable locked computer use in Codex settings under Computer Use. An Apple authorization plug-in participates in the macOS unlock flow. When Codex needs access after the Mac locks, it temporarily unlocks the machine while blocking local use and preserving locked-screen protections. Before unlocking, it checks that the request is part of an active trusted Computer Use turn. Outside that short window, automatic unlock is denied. The display is covered, and if local keyboard or pointer input is detected, the Mac relocks and automatic unlock stops.

That design signals that OpenAI understands the risk. It does not remove the risk. Computer Use can involve screen contents, screenshots, windows, menus, keyboard input, and clipboard state. Inside allowed apps, clicks and inputs can happen under a real user account. OpenAI's documentation warns that websites may treat approved clicks or form submissions as actions from the user's account. Locked use should therefore not be treated as "convenient, leave it on." The better framing is "which exact tasks are allowed to use this?"

In a corporate development environment, that question gets harder. A personal Mac may have Slack, email, admin consoles, cloud dashboards, and customer data open at the same time. The agent's work surface needs to be narrowed by app, account, and route. OpenAI advises users to close sensitive apps, review permission prompts, and stay present for account, security, privacy, network, payment, and credential settings. Real operating rules need to be just as specific. Not "Codex is allowed." Instead: this app, this route, this test account, and this time window.

The Competition Moves Toward Work Boundaries

This Codex update is not only an OpenAI story. Google Antigravity is moving toward an agent command center that joins AI Studio, Android, Firebase, and Google Cloud. GitHub Copilot is combining remote control, model routing, usage-based billing, and cloud agents. Anthropic's Claude Code is expanding around goals, plugins, MCP, compute limits, and managed agents. Cursor is pushing the IDE and cloud/background agents together.

Model performance still matters. But what users feel more often is the runtime. Which screen can the agent see? Which browser can it validate in? How does it connect local files to cloud work? What approvals does it require while the user is away? What logs and diffs does it leave after failure? Where are token cost and compute limits visible? Those questions increasingly decide product choice.

That explains why OpenAI is bundling Appshots, Goal mode, in-app browser work, and locked use together. Long tasks need goals. Durable goals need context. Frontend and GUI tasks need visual verification. Remote supervision needs a permission model for the lock screen. Each feature can look small in isolation. Together they redraw the working boundary of a coding agent.

For development teams, the practical question is larger than "Codex has new features." What screens can our team safely show an agent? Are our long-running tasks written with testable metrics and completion criteria? Do we have routes and test states that a browser agent can verify? If remote or post-lock work is allowed, which apps and accounts must be closed first? Adopting a coding agent no longer ends with installing an IDE extension. It means designing permission, context, verification, and stop conditions.

Conclusion

OpenAI's May 21 Codex update is not a flashy model launch. That is what makes it important. Once coding agents enter real development work, the working environment becomes as important as the model name. Appshots turns the screen a user is seeing into context. Goal mode raises the definition of done into the product surface. The in-app browser and annotations connect code changes to rendered results. Locked computer use productizes a difficult compromise between remote supervision and local permission boundaries.

All four features point in the same direction. AI coding tools are becoming execution environments rather than conversational helpers. Execution environments need more than stronger models. They need good goals, narrow permissions, verifiable screens, and careful remote boundaries. This Codex update shows those requirements moving from side features into the center of coding-agent competition.