agent-agnostic lastVerified: 2026-05-09

Andrej Karpathy Coding — Spec-Then-Run-Don't-Imagine

Karpathy's coding mental model for working with LLMs: small steps, evals over hopes, prefer dumb code, run-don't-imagine when in doubt.

On this page
  1. What this skill teaches
  2. The skill body
  3. Why this matters
  4. Where it fits in the loop

What this skill teaches

Karpathy’s coding posture treats the model as an unreliable junior who happens to type fast. The thesis is the inversion of the popular “AI writes code” framing: code is the medium an agent uses to think, not the artifact it ships. So the human’s job is to keep that thinking observable. You ask for the smallest runnable artifact, you read the diff, you run it, and you decide whether the output earns more scope. Skipping the run step is where vibe-coding sessions silently drift away from the goal you started with.

The skill body

karpathy-coding.md
# Andrej Karpathy — Coding Skill

A coding posture for working with an LLM agent on real software. Adapted
from the recurring themes in Karpathy's tweets, talks, and the "vibe
coding" disclaimer he popularised in 2025.

## Core stance

Code is the agent's medium for thinking. Treat every diff as a thought
the agent had out loud. Your job is not to grade the prose, your job is
to decide whether the thought is true. The only way to decide is to run
it.

## Working rules

### 1. Spec first, in one or two sentences

If you cannot say in one paragraph what done looks like, do not start.
Make the agent restate the goal in its own words before any file gets
touched. A thirty second spec saves a thirty minute rewrite.

### 2. Run, don't imagine

Reasoning about hypothetical code is expensive. Running real code is
cheap. When you find yourself debating whether a function works, stop
debating, run it. Print the inputs, print the outputs, look at the
shapes. The agent's confidence drops sharply once it can see actual
values instead of imagined ones.

### 3. Prefer dumb code

Explicit beats clever. Long beats nested. Two passes beats one passing
trick. The agent is better at writing, reading, and debugging the boring
version of a function than the smart version. Save smart for hot paths
once a benchmark says you need it.

### 4. Evals over hopes

"I think this works" is not done. A test that calls the function with
real arguments and checks the return value is done. A run on the actual
data is done. Hopes are an artefact of staring at code without running
it. If a thing is worth shipping, it is worth a one-line eval.

### 5. Vibe coding is real, declare it

When the goal is exploration, say so out loud. "I am vibe coding this
weekend, scrappy is fine, I will revisit Monday." That label tells the
agent to skip the production-shaped scaffolding it would otherwise add
by default. Without the label, the agent will try to write a library
when you wanted a script.

### 6. Smallest runnable step

The unit of work is the smallest patch that produces something you can
run end to end. Not the smallest patch that compiles, the smallest patch
that runs. Compile-only patches accumulate into a pile that nobody can
verify. One running patch beats four green builds.

## Anti-pattern: silent improvement

The agent will quietly add a try/catch block, rename an export for
clarity, or tighten a TypeScript signature `while we are here`. Treat
silent improvement as a failure mode. Either it is in scope and worth a
test, or it is out of scope and worth a separate diff.

## Verification ritual

Before you call any task done, run the binary that uses the new code,
not just the binary that contains it. Compile passing is not running.
Tests passing is not running on production-shaped data. The smallest
honest end-to-end run is the only signal worth trusting.

Why this matters

The default failure mode of LLM coding sessions is silent confidence drift. Each turn nudges the agent further from the goal because no concrete signal contradicts the trajectory. Karpathy’s discipline plants concrete signals at every step. A spec restated by the agent forces alignment up front. A real run forces the model to look at outputs instead of plausible-sounding stories about outputs. Dumb code makes the diff reviewable. The cumulative effect is that the agent stays a tool you direct rather than a system you negotiate with. That posture pairs naturally with GSD core, where the verify gate is the structural enforcement of the same idea.

Where it fits in the loop

Reach for this skill at the start of any new session, before the first request lands, and again whenever a task starts feeling vague. The natural sequence is Karpathy first to set the posture, then Superpower core to lock the read-plan-run-report cadence inside a single task, then GSD core when the work spans more than one session. The agent-rules generator at the bottom of this page wires those postures into a CLAUDE.md or AGENTS.md so the agent encounters them on every turn, not only when you remember to remind it.