Andrej Karpathy Coding — Spec-Then-Run-Don't-Imagine
Karpathy's coding mental model for working with LLMs: small steps, evals over hopes, prefer dumb code, run-don't-imagine when in doubt.
What this skill teaches
Karpathy’s coding posture treats the model as an unreliable junior who happens to type fast. The thesis is the inversion of the popular “AI writes code” framing: code is the medium an agent uses to think, not the artifact it ships. So the human’s job is to keep that thinking observable. You ask for the smallest runnable artifact, you read the diff, you run it, and you decide whether the output earns more scope. Skipping the run step is where vibe-coding sessions silently drift away from the goal you started with.
The skill body
# Andrej Karpathy — Coding Skill
A coding posture for working with an LLM agent on real software. Adapted
from the recurring themes in Karpathy's tweets, talks, and the "vibe
coding" disclaimer he popularised in 2025.
## Core stance
Code is the agent's medium for thinking. Treat every diff as a thought
the agent had out loud. Your job is not to grade the prose, your job is
to decide whether the thought is true. The only way to decide is to run
it.
## Working rules
### 1. Spec first, in one or two sentences
If you cannot say in one paragraph what done looks like, do not start.
Make the agent restate the goal in its own words before any file gets
touched. A thirty second spec saves a thirty minute rewrite.
### 2. Run, don't imagine
Reasoning about hypothetical code is expensive. Running real code is
cheap. When you find yourself debating whether a function works, stop
debating, run it. Print the inputs, print the outputs, look at the
shapes. The agent's confidence drops sharply once it can see actual
values instead of imagined ones.
### 3. Prefer dumb code
Explicit beats clever. Long beats nested. Two passes beats one passing
trick. The agent is better at writing, reading, and debugging the boring
version of a function than the smart version. Save smart for hot paths
once a benchmark says you need it.
### 4. Evals over hopes
"I think this works" is not done. A test that calls the function with
real arguments and checks the return value is done. A run on the actual
data is done. Hopes are an artefact of staring at code without running
it. If a thing is worth shipping, it is worth a one-line eval.
### 5. Vibe coding is real, declare it
When the goal is exploration, say so out loud. "I am vibe coding this
weekend, scrappy is fine, I will revisit Monday." That label tells the
agent to skip the production-shaped scaffolding it would otherwise add
by default. Without the label, the agent will try to write a library
when you wanted a script.
### 6. Smallest runnable step
The unit of work is the smallest patch that produces something you can
run end to end. Not the smallest patch that compiles, the smallest patch
that runs. Compile-only patches accumulate into a pile that nobody can
verify. One running patch beats four green builds.
## Anti-pattern: silent improvement
The agent will quietly add a try/catch block, rename an export for
clarity, or tighten a TypeScript signature `while we are here`. Treat
silent improvement as a failure mode. Either it is in scope and worth a
test, or it is out of scope and worth a separate diff.
## Verification ritual
Before you call any task done, run the binary that uses the new code,
not just the binary that contains it. Compile passing is not running.
Tests passing is not running on production-shaped data. The smallest
honest end-to-end run is the only signal worth trusting. Why this matters
The default failure mode of LLM coding sessions is silent confidence drift. Each turn nudges the agent further from the goal because no concrete signal contradicts the trajectory. Karpathy’s discipline plants concrete signals at every step. A spec restated by the agent forces alignment up front. A real run forces the model to look at outputs instead of plausible-sounding stories about outputs. Dumb code makes the diff reviewable. The cumulative effect is that the agent stays a tool you direct rather than a system you negotiate with. That posture pairs naturally with GSD core, where the verify gate is the structural enforcement of the same idea.
Where it fits in the loop
Reach for this skill at the start of any new session, before the first request lands, and again whenever a task starts feeling vague. The natural sequence is Karpathy first to set the posture, then Superpower core to lock the read-plan-run-report cadence inside a single task, then GSD core when the work spans more than one session. The agent-rules generator at the bottom of this page wires those postures into a CLAUDE.md or AGENTS.md so the agent encounters them on every turn, not only when you remember to remind it.
Open in the agent-rules generator
Wire this skill into a full CLAUDE.md / AGENTS.md / .cursor/rules/ template — pre-filled, copy-pasteable.