Five Claude Code hooks that make coding agents actually reliable

If you use Claude Code for anything past personal projects, hooks are the single highest-leverage feature. They are where "agent that sometimes does stupid things" becomes "agent that gets corrected before anything bad happens."

This is the hook set we actually run in production, what each one catches, and the trade-offs we made.

1. Pre-code permission hook (`PreToolUse`)

Fires on: every Bash, Write, Edit invocation the agent attempts.

What it does: classifies the tool call by danger level. Read-only operations (ls, grep, cat, Read) pass through. Write operations (Edit, Write, git commit) require explicit approval — either an allowlist match or a human prompt.

What it catches: the agent confidently running rm -rf /tmp/build when you meant /tmp/build/stale. Or writing to .env.production when it meant .env.local.

The right shape is not "block all writes" — that makes the agent useless. It is "allow anything on this list, ask on everything else." The list grows as you find patterns you are happy with.

ALLOW_PATTERNS=(
  "^git (status|diff|log|branch)"
  "^npm (test|run build)"
  "^python3 .*test.py"
)

2. Post-code verification hook (`PostToolUse`)

Fires on: after every Edit or Write.

What it does: runs python3 -m py_compile on any touched .py, tsc --noEmit on .ts, go vet on .go. If anything breaks, the hook prints the error back to the agent as its next context, so it sees its own mistake immediately rather than three tool-calls later when the test run finally surfaces it.

What it catches: typos, missing imports, undefined variables. Turns the agent's natural "write, then test, then fix" loop from "30 seconds of wasted tokens per mistake" into "agent sees error inline and fixes in the next response."

Cost: ~200ms per edit. Worth every millisecond.

3. Session-memory-load hook (`SessionStart`)

Fires on: every new session.

What it does: queries the memory server for "last session on this project," "user preferences," and "any corrections from recent sessions." Injects the results as a system-reminder at session start.

What it catches: the "cold-start" problem. Without this, the agent re-learns every convention, re-asks every clarification, re-makes every mistake that was already corrected last week. With it, session N+1 feels like a continuation of session N.

The one thing to watch: do not dump all memory at session start. Curate what is relevant to this project and this user. Memory noise is worse than no memory.

4. Pre-answer coherence check (`UserPromptSubmit`)

Fires on: every user message.

What it does: runs a semantic search against project memory for the user's prompt. If there are high-relevance hits, injects them as context. If the prompt contradicts a stored fact (e.g. user says "the DB is PostgreSQL" but memory says "we migrated to SQLite last week"), surfaces the conflict before the agent answers.

What it catches: stale mental models. In long-running projects, either the human or the AI will be slightly out of date on something. This hook makes that divergence visible instead of letting the wrong one propagate.

5. Stop-on-failure hook (`Stop` / `StopHook`)

Fires on: agent completes its turn.

What it does: checks whether any open TODO items are still in-progress, any tests are failing, any claims in the agent's output are unverified. If yes, blocks the "stop" state and pushes the agent back to finish.

What it catches: the "looks-done-but-is-not" failure mode. Agent confidently says "feature implemented" — hook notices tests are red — agent goes back and fixes. No human has to babysit.

This one is the highest-leverage but also the trickiest. Too aggressive and the agent never stops. Too loose and nothing is enforced. Calibrate by gradually tightening.

What we explicitly did not add

Per-file-type formatters. prettier --write on save is tempting but makes the agent's diffs unreviewable. Format on commit instead, not on every edit.
Rate limits. Useful as cost control but not as reliability control. Rate-limiting a correct agent helps nobody.
Automatic rollback on error. Sounds great until you realise the agent can revert work that was intentionally in progress. Make rollback explicit.

Why this compounds

The individual hooks are small. The compound effect is large: an agent that self-corrects typos, self-loads context, self-flags contradictions, and refuses to call itself done until it actually is. That is qualitatively different from "smart autocomplete."

Most of this can be built in a weekend of shell scripting. If you run Claude Code past the toy stage, the ROI is absurd.

We provide these as a reference hook set with EON SaaS — but the point is the pattern, not the product.

1. Pre-code permission hook (PreToolUse)

2. Post-code verification hook (PostToolUse)

3. Session-memory-load hook (SessionStart)

4. Pre-answer coherence check (UserPromptSubmit)

5. Stop-on-failure hook (Stop / StopHook)