Claude Code adds /goal so coding agents can work toward explicit stop conditions

Claude Code 2.1.139 adds /goal, a session-level completion loop where a separate evaluator decides when an agent has actually finished.

AI 요약

What happened: Claude Code 2.1.139 added /goal.
- A user can set a session-level completion condition and let Claude continue across turns until that condition is met.
- After each turn, a separate evaluator checks the conversation history against the stated goal.
Why it matters: /goal moves coding-agent work from open-ended prompting toward explicit stop conditions.
- This fits work such as passing tests, completing migrations, clearing issue queues, or enforcing a bounded edit scope.
Watch: The evaluator does not inspect files or run commands on its own.

Anthropic added the /goal command in Claude Code 2.1.139, with the changelog dated May 11, 2026. On the surface, it is a small CLI command. In practice, it targets one of the hardest operational questions in coding agents: who decides that an autonomous coding task is finished?

The new command lets a user define a completion condition and have Claude keep working across multiple turns until that condition is satisfied. A developer might write that all auth tests must pass, lint must exit cleanly, and no files outside a specified directory may be changed. Claude then works toward that state. After each turn, a separate evaluator reads the goal and the conversation history, decides whether the condition has been met, and either stops the loop or feeds the reason for non-completion into the next turn.

Anthropic's documentation describes /goal as a session-scoped wrapper around a prompt-based Stop hook. That detail matters. The feature is not only "keep going." It separates the model doing the work from a smaller evaluator that judges whether the evidence shown in the conversation satisfies the user's stated condition. Anthropic says the default evaluator is Haiku. The product direction is clear: long-running coding agents are being framed less as a matter of longer answers and more as a matter of goals, evidence, evaluation, and stopping rules.

Why /goal now

Early AI coding-tool competition focused on model quality and editing experience. Could the assistant write a plausible patch? Could it read enough files? Did the IDE show a clean diff? Those still matter, but agentic coding exposes a different class of problems. When an agent keeps working for minutes or hours, the practical questions become: when should it stop, what proof did it collect, which tests actually ran, what files did it touch, and can a user return later and understand the current state?

Users have long been able to type "keep going until tests pass" into a coding assistant. That instruction is fragile. The model may decide that it has done enough, stop after a partial fix, summarize a failed test too generously, or choose a bad shortcut such as changing an assertion instead of fixing the behavior. The instruction also does not become durable session state. A user cannot easily query the active completion condition, the number of evaluated turns, the token budget consumed, or the evaluator's latest reason.

/goal turns that pattern into a CLI-level feature. The official docs say one goal can be active per session. Running /goal with a condition replaces any existing goal. Running /goal with no arguments shows the active condition, elapsed time, evaluated turn count, token usage, and the evaluator's most recent reasoning. Running /goal clear removes the goal early, and aliases such as stop, off, reset, none, and cancel are accepted for clearing.

Those details sound administrative, but administration is the point. Coding agents are no longer just chat sessions that happen to edit files. They are becoming execution units with state, status, evidence, and resumability.

/goal is not /loop

Claude Code already has /loop. Anthropic's docs distinguish /goal, /loop, and Stop hooks. /loop starts the next turn after a time interval. It is natural for checking a deployment every five minutes, watching logs, or producing a recurring report. A Stop hook runs after a turn and lets custom script or prompt logic decide what happens next. /goal runs after each turn and asks a model evaluator whether the goal condition has been satisfied.

That distinction is important for code work. A migration is not done because ten minutes passed. It is done when the intended call sites were updated, the code compiles, the relevant tests pass, and the changed files remain within the intended scope. For that class of task, a completion condition is a better control primitive than a timer.

Mode	Next turn starts when	Loop stops when	Best fit
/goal	The previous turn finishes	A separate evaluator confirms the completion condition	Passing tests, finishing migrations, clearing issue queues
/loop	A configured time interval elapses	The user stops it or Claude judges the task complete	Deployment checks, periodic log review, recurring summaries
Stop hook	The previous turn finishes	User-defined script or prompt logic decides	Organization-specific checks, policy gates, custom evals

/goal does not replace Stop hooks. It productizes one common Stop-hook pattern for users who want a goal inside the current session without editing hook configuration. Teams that need deterministic checks, policy enforcement, or custom scripts still have reasons to use hooks. The new command lowers the entry cost for goal-driven agent runs.

How to write a usable completion condition

Anthropic's docs are specific about what makes a good goal. The evaluator does not read files or run commands independently. It judges only the goal text and the evidence Claude leaves in the conversation. A vague goal such as "finish the work" gives the evaluator little to verify. A strong goal names a measurable final state, the verification output that should prove it, and the constraints the agent must respect.

One useful shape is:

/goal all tests in test/auth pass, npm run lint exits 0, and no files outside src/auth and test/auth are modified

That condition has three parts. The final state is passing auth tests. The proof includes a lint command exiting successfully. The constraint is a bounded edit scope. The evaluator still will not run npm run lint itself, but if Claude runs the command and shows the output in the conversation, the evaluator can use that evidence.

The maximum condition length is 4,000 characters. For long-running tasks, the condition can include a stop rule such as "or stop after 20 turns and summarize remaining blockers." That turns cost control into part of the goal itself. A good completion condition should define both success and the point where continued autonomous work should pause for human review.

The evaluator is the interesting part

The most important design choice in /goal is the separation between the worker model and the evaluator model. After each turn, the condition and conversation so far are sent to a small, fast model. It returns a yes-or-no decision with a short reason. If the answer is no, the reason becomes input for the next turn. If the answer is yes, Claude Code clears the goal and records that it was achieved.

This is not a full verification system. The evaluator cannot inspect the actual filesystem. It cannot independently run the test suite. If Claude's summary is inaccurate, the logs are incomplete, or the goal itself is ambiguous, the evaluation can be wrong. /goal should not be treated as a substitute for CI, type checking, linting, code review, or deployment gates.

The design still addresses a real failure mode. Coding agents often overrate their own patches because the same model that created the solution is also narrating why the solution is done. A separate evaluator does not eliminate that problem, but it removes the completion decision from the worker model's self-assessment. For agentic coding, even a lightweight separation between doing and judging is a useful control surface.

The same idea appears in Managed Agents outcomes

Claude Code /goal is a local CLI feature, but it points in the same direction as Anthropic's May 6 updates to Claude Managed Agents. Anthropic announced dreaming, outcomes, multi-agent orchestration, and webhooks. The relevant piece is outcomes: developers define success criteria as a rubric, the agent works toward those criteria, and a separate grader evaluates the result before sending gaps back to the agent.

The two systems are not identical. Managed Agents outcomes live in an API and managed runtime, with server-side structures such as rubrics, session events, outcome evaluation events, and output file retrieval. Claude Code /goal lives in a CLI session and uses a session-scoped completion condition wrapped around Stop-hook behavior. But the product concept is similar. Both move the agent from "answer this prompt" toward "iterate until this externalized success criterion is satisfied."

The user defines a completion condition

↓

The agent edits code, runs tests, and checks logs

↓

A separate evaluator judges the evidence shown in the conversation

↓

If evidence is missing, another turn starts; if complete, the goal ends

For AI developer tools, this is a meaningful shift. A more capable model is still useful, but unattended work needs repeatable loops and explicit stop conditions. In coding, "the file changed" is weaker than "the tests passed, the edit scope stayed bounded, and the final output is reviewable."

What changes for developers

For an individual developer, /goal is most useful when the desired end state is verifiable. Small refactors, broken tests, type errors, documentation migrations, dependency updates, and issue queues are good candidates. Product design, ambiguous UI judgment calls, external account changes, or any task with sensitive side effects require more caution. The ability to run unattended does not mean every task should be made unattended.

For teams, the more interesting change is that completion conditions can become shared operational patterns. A team can standardize goals such as "report the changed file list at the end," "show the exact test and lint output," "do not modify files outside these directories," or "after 20 turns, stop and summarize the blockers." That is not just prompt style. It is agent operations.

CI and human review do not become less important. They become the anchor for the agent loop. The more work an agent does before a person returns, the more important it is that the final turn expose concrete evidence: commands run, exit codes, changed files, skipped checks, and unresolved risks. A reviewer should be able to start from the completion condition and compare it with the evidence, not reconstruct the agent's path from a long chat transcript.

Community reaction mixed workflow excitement with control concerns

GeekNews shared the /goal feature on May 12, summarizing the key mechanics: Claude continues across turns until a goal is complete, a fast model evaluates after each turn, and the evaluator judges only the conversation rather than inspecting files or commands directly. That distinction is the practical limitation developers need to keep in mind.

On Reddit, the reaction was more direct. A thread in r/ClaudeAI framed /goal as a "run until done" mode and imagined workflows where multiple sessions keep working while the developer returns later. The same discussion surfaced the obvious risk: once goals can touch live websites, browser automation, external systems, or privileged repositories, guardrails, scoped tools, visible state, logs, and review points become more important than the command itself.

That concern is not theoretical. A badly written goal can scale a badly scoped task. If the condition is ambiguous, the evaluator may behave inconsistently. If the agent is allowed to touch external systems, repeated attempts can amplify side effects. Goal-driven loops need permission boundaries and stopping rules, not only better prompts.

The coding-agent race moves toward goals and observability

The next layer of competition in AI coding tools is unlikely to be only "which model writes more code." Developers can already generate code in many places. The harder product question is what loop produced the code. What was the objective? What did the evaluator see? Which tests actually ran? What permissions were requested? Why did the run stop?

Claude Code 2.1.139 also added an agent view via claude agents, showing running, blocked, and completed sessions in one list. That pairs naturally with /goal. One feature says what a session is trying to achieve. The other makes long-running sessions easier to observe. Together, they point toward a future where developers run multiple agent tasks and manage them by status, evidence, and completion criteria rather than by scrolling through individual chat windows.

2.1.139

Claude Code release

Added /goal and agent view

4,000

Condition characters

Maximum length in the official docs

Active goal per session

A new goal replaces the old one

That pressure will not be limited to Claude Code. Codex, Coder, GitHub Copilot coding agent, Warp, Oz, and other agentic development environments all face the same questions. Enterprise users will ask less often whether an agent is "smart" in the abstract and more often how it stops, proves its work, resumes, logs state, and hands off to a reviewer.

The remaining limits

The biggest limit is still verification. Because the evaluator does not call tools, a goal such as "all tests pass" depends on Claude actually running the tests and surfacing the output. If a test result is omitted from the conversation, the evaluator cannot use it. If the agent paraphrases the result incorrectly, the evaluator may accept bad evidence.

The second limit is goal quality. A broad goal can cause broad edits. A vague goal can lead to inconsistent evaluator decisions. A goal tied to external systems can create repeated side effects. The command makes autonomous continuation easier; it does not make the task safe by default.

The practical rules are straightforward. Put verification commands in the goal. Limit the edit scope. Add an explicit stop condition. Require a final summary with changed files and verification output. Avoid unattended goals for work that changes production data, billing settings, access control, or other sensitive external state unless the permission model and review point are clear.

Better stopping beats longer answers

The point of Claude Code /goal is not that Claude can produce a longer response. The point is that agent execution can be organized around a completion condition. Developers have already been building this pattern with natural-language prompts, scripts, loops, and hooks. Anthropic has now made one version of it a first-class command.

That is a useful marker for where coding agents are going. Model intelligence, context length, and tool count are not enough for production use. Practical agent work needs objectives, evidence, evaluators, stopping criteria, resumability, and observability. /goal gives CLI users direct control over two of those pieces: the objective and the evaluation loop.

For developers, the update changes the skill that matters. Good agent use is becoming less about writing a clever prompt and more about designing a precise completion condition with a verifiable loop. The better the stop condition, the easier it is to trust the work and review the result.