Stanford CS336 publishes CLAUDE.md, drawing a sharper line for AI coding assignments
Stanford CS336 uses a repo-level CLAUDE.md to keep coding agents in a teaching-assistant role, allowing concept help while banning solution code.
- What happened: Stanford CS336's
assignment1-basicsrepository put a publicCLAUDE.mdat the center of its AI-agent policy.- The file tells Claude Code, ChatGPT, Copilot, Cursor, and similar tools to act as a teaching assistant, not a solution generator.
- Allowed: concept explanations, references to course or official docs, review of student-written code, error-message interpretation, and sanity-check suggestions.
- Forbidden: writing Python or pseudocode, completing TODOs, editing the student repo, running bash, or implementing tokenizers, training loops, Triton kernels, and RL methods.
- Builder impact: files such as
CLAUDE.md,AGENTS.md, and Cursor rules are becoming policy surfaces for education, onboarding, and agent governance.
The AI story that reached the Hacker News front page on June 1, 2026 was not a new model release. It was Stanford CS336's CLAUDE.md file inside the assignment1-basics repository. The HN submission, titled AI Agent Guidelines for CS336 at Stanford, climbed past 400 points and drew more than 100 comments. The file is written for Claude Code, but its audience is broader: ChatGPT, GitHub Copilot, Cursor, and other AI coding assistants that might be opened inside a student assignment repo.
The file matters because of where it lives. CLAUDE.md is not a detached academic integrity memo on a course website. It sits at the root of the assignment repository, next to the code a student will inspect, edit, test, and submit. When a coding agent enters the workspace, the policy can be read alongside the project itself. Stanford is putting the AI-use boundary in the same operating context as the tool.
The first line narrows the agent's role: "Teaching Assistant, Not Solution Generator." In practical terms, the agent may explain a concept the student does not understand, point to lecture material or official documentation, review code the student already wrote, and explain errors in Python, PyTorch, CUDA, Triton, or distributed training. The agent leaves policy territory when it writes the answer.
The forbidden list is more specific than a generic warning against cheating. The CS336 file bars the assistant from writing Python code or pseudocode, solving assignment problems, completing TODO sections, editing the student's repository, running bash commands, performing large refactors, or turning assignment requirements into working code. It also names the core systems directly: tokenizers, transformer blocks, optimizers, training loops, Triton kernels, distributed training logic, scaling-law pipelines, data filtering and deduplication pipelines, and alignment or reinforcement-learning methods.
That hard boundary matches the CS336 course page. Spring 2026 CS336 is titled Language Modeling from Scratch. Students implement tokenization, model architecture, optimization, and minimal language-model training before moving into systems optimization, scaling laws, Common Crawl processing, alignment, and reasoning reinforcement learning. The course page says the class requires at least an order of magnitude more coding than other AI courses and provides limited scaffolding.
The course honor code draws the same line. Prompting LLMs is allowed for low-level programming questions or high-level language-model concepts. Asking an LLM to directly solve assignment problems is prohibited. The same policy strongly recommends disabling AI autocomplete tools such as Cursor Tab or GitHub Copilot while working on assignments, because autocomplete can make it hard for students to stay deeply involved with the implementation.
| Area | Allowed agent behavior | Forbidden agent behavior |
|---|---|---|
| Learning support | Explain concepts and connect students to lectures or official docs | Convert assignment requirements into answer code |
| Debugging | Interpret errors and suggest sanity checks or profiler investigations | Modify the student repo, run bash, or complete TODOs |
| Core implementation | Offer high-level approaches and ask diagnostic questions | Implement tokenizers, transformers, optimizers, Triton kernels, or RL code |
The HN response was large because this is no longer only an education problem. Coding agents in 2025 and 2026 have more authority than autocomplete. Claude Code, Codex, Copilot coding agent, and Cursor agent mode can read files, propose patches, run tests, and in some workflows prepare pull requests. If a course policy only says "you may receive help," the actual action space includes code edits, shell commands, test runs, and generated submissions.
CS336 breaks that gap into concrete behaviors. "Concept questions are allowed" can be ambiguous for both the student and the agent. The repository instruction translates it into actions: ask questions, point to course materials, review what the student wrote, and suggest observations for debugging. The prohibited side is equally operational: do not write Python, do not write pseudocode, do not run bash, do not edit the repo, and do not fill in TODOs.
That design carries directly into junior developer onboarding. During a ramp-up period, an agent that produces a finished patch may hide the exact system knowledge a new engineer needs to build. A narrower assistant can explain how to read logs, design a minimal test, reason about a trace, or inspect a profiler output. A file such as CLAUDE.md gives the team a way to ship the agent's role with the codebase instead of relying on scattered norms.
Text instructions alone do not enforce policy. One of the stronger HN reactions argued that important requirements should be backed by hook scripts, session logs, or other executable controls rather than prompt text only. That criticism applies in production too. If an agent must not run shell commands, the environment should enforce permissions, sandboxing, allowlists, and audit logs. A repository instruction is a starting point, not a complete compliance system.
The Stanford policy also does not claim that "never write code" is the answer in every setting. CS336 uses a strict boundary because the assignment objective is implementation learning. Copying that exact rule into a company bug-fix sprint would slow down work for the wrong reason. In education, certification, interview take-homes, internal bootcamps, and security training, the process is part of the artifact. In those settings, the agent's questions and checks can be more valuable than the patch.
The assignments themselves explain why the distinction is sharp. Assignment 1 asks students to implement a tokenizer, model architecture, optimizer, and training loop for a language model. Assignment 2 expands into profiling, benchmarking, Triton-based FlashAttention2, and distributed training. Assignment 4 turns Common Crawl into pretraining data through filtering and deduplication. If the agent writes those pieces, the student skips the central learning loop.
The same policy still allows useful help. A student can ask why a causal mask appears to make training unstable. The CLAUDE.md examples steer the assistant toward checks: whether the mask is applied before softmax, whether it broadcasts over the score tensor shape, and whether masked positions are set to a very small value instead of zero. The agent can suggest printing attention scores on a length-three toy sequence. It does not patch the implementation.
That example changes how an AI coding assistant is evaluated. Product demos often emphasize how fast an agent can produce a pull request. In the CS336 context, speed is not the metric. The relevant question is whether the assistant helps the student observe their own bug, test a hypothesis, and connect behavior back to the model implementation. The same model can behave like a proxy developer or a code-review teaching assistant depending on repo instructions, output style, and tool permissions.
The HN discussion also split over the instruction's length and enforceability. One educator said they were experimenting with AGENTS.md in a similar way and wanted students to leave prompt and response summaries in a .history folder when using AI. Another commenter argued that long instructions can fall out of a context window and that production agents may need a shorter system prompt plus stronger environment constraints. Others pointed to Claude Code's Learning output style, which can make the assistant explain the process instead of implementing on the user's behalf.
For developer teams, the immediate checklist is concrete. First, repo-level instructions are no longer decorative docs. If agents read the working directory, policy files become operational assets closer to review rules than wiki pages. Second, prohibitions should be written as actions, not product names. "No Copilot" is less useful than "do not edit TODOs, do not run shell commands, do not generate a PR, and do not use third-party implementations." Third, the policy should be paired with evaluation: permission settings, review checklists, audit trails, and an onboarding rubric.
CS336 combines those pieces with an honor code, assignment design, public Slack, office hours, Gradescope submission, and repo-level AI instructions. Companies face a similar coordination problem. They need to decide which repos allow autonomous patches, which repos permit only explanations, which tasks require human-authored diffs, and which agent actions must be logged. CLAUDE.md cannot solve that alone, but it makes the intended behavior visible at the place where the agent works.
The larger signal is not simply that coding agents have reached the classroom. The question has moved from "may students use AI?" to "which file defines the agent's role, which permissions are available, which artifacts are forbidden, and which questions are encouraged?" Stanford CS336's 74-line policy file looks small next to a model launch, but it shows the unit of governance that education programs and engineering teams will increasingly need as AI agents become part of assignments, onboarding, assessment, and code review.