Less Instructions, Better Agents

Every time my agent did something dumb, I added a rule. “Don’t invent patterns.” “Always run tests.” “Use this exact output format.” I never went back to remove the ones that stopped mattering. Two thousand lines of instructions later, the agent was slower, more expensive, and no better at its job.

Then Theo posted “Delete your CLAUDE.md” and I finally had a name for what I was doing wrong. He referenced two papers. “Evaluating AGENTS.md” found that instruction files actually make agents worse while costing 20% more. “SkillsBench” showed that small, focused skills beat big documentation dumps.

The HN thread pushed back on this. A top comment argued that the real value is domain knowledge the model can’t figure out on its own. A paper author responded: sure, but the gains aren’t consistent and some models actually get worse. The takeaway from both sides was the same: keep it short, keep it useful.

So I deleted half of everything.

The numbers

	Before	After
Global rules	225 lines	107 lines
Agent prompts	810 lines	~280 lines
Skills	990 lines	~530 lines
Total	~2,025	~920

I cut it in half. The stuff that loads on every single message dropped from ~335 lines to ~130, and that’s the part that matters most because those words compete with every task for the model’s attention.

What I cut

I had a table ranking how to check your work. “Tests pass” ranked above “looks right.” The model did not need a numbered list to figure that out. I also had an eleven item anti-patterns list that just restated the rules in different words:

# Rules section
- Never invent patterns when one exists

# Anti-patterns section (same file, 40 lines later)
- Inventing a new error format when one exists

That’s the same idea written twice, and every duplicate costs tokens on every message.

Build, plan, and debug modes all opened with the same paragraph about who the agent is and how to communicate, which the global rules already covered. I also had five response templates telling the model exactly how to phrase “I’m done” or “I’m stuck,” and removing them changed nothing about the output.

What actually matters

A prompt has one job: correct the gap between the model’s default behavior and what you want. I found three things that are worth keeping.

Behavioral corrections are one sentence each. These are actual lines from my AGENTS.md file, and each one exists because I watched the model do the wrong thing and wrote the shortest sentence that fixed it:

Keep narration short: say what you're doing or what changed, then move on.

Match the codebase's naming, structure, error handling, return types, and test style.
Never invent a new pattern when one exists.

Small verified chunks: implement, verify (test/typecheck/lint), then next piece.

If you need a paragraph to explain a rule, the rule is too complex.

Novel context is the stuff the model can’t figure out by reading the repo. Custom tools, workspace auth, the fact that @explore runs on Haiku. The AGENTS.md paper confirmed this: the non-obvious domain knowledge is what actually helps.

Disposition, not resume. A line like “You are a senior software engineer with deep systems knowledge” does nothing because the model already acts like one. Compare that to:

You work autonomously: figure things out, get it done,
and keep the user informed on what matters.

That actually changes how the model behaves.

One commenter nailed the approach: only add a rule when the agent fails, revert the changes, re-run, and see if the rule actually helped. If it didn’t, delete it.

New agent modes

Build, plan, and debug were artificial categories. In practice, debugging a big issue means scoping it first, then fixing it. Planning always leads to building. The modes didn’t match how I actually work, so I replaced them.

{
  "build": { "mode": "primary",   "prompt": "{file:./prompts/build.txt}" },
  "deep":  { "mode": "primary",   "prompt": "{file:./prompts/deep.txt}" },
  "pilot": { "mode": "primary",   "prompt": "{file:./prompts/pilot.txt}" }
}

Build

I use this 90% of the time. Most tasks are “implement this,” “fix that,” “add tests for this.” They don’t need a research phase or a plan. They just need the agent to figure out how it’s going to prove the changes work, do the work, and validate it.

That’s all build does, in about 20 lines:

Know how you’ll prove it works before writing code
Write the code
Run tests, check types, check lint, check the running app
If something breaks, fix it
If you find an unrelated issue, fix the trivial stuff and flag anything risky

Deep

I use this for when getting it wrong is expensive. A big refactor, a migration, something that touches a lot of files.

I used to have a separate plan mode, but the problem with it was that I don’t always need a plan. Sometimes I just need the agent to be a lot more careful. Deep replaced both plan and debug because the real need was never “make me a plan” or “follow a debugging framework,” it was “slow down, think harder, and check with me before you do something risky.”

The agent reads everything that could break, traces the full dependency graph, and presents what it found. Then it builds a staged plan and I approve it before it writes any code. If something unexpected comes up during execution, it stops and reassesses instead of silently adapting.

The difference from build is simple: build acts first and asks when stuck, deep asks first and acts once I’m confident.

Pilot

I give it a direction and it runs until I stop it. “Make this stable.” “Implement this feature end to end.” “Improve performance.”

The loop is always the same:

Assess the current state
Pick the highest impact next action
Execute and validate
Report what it did in one sentence
Loop

If something is blocked, it sets it aside and moves to the next thing. I interrupt when I want to change direction.

What survived

Be resourceful. You have full access to the machine.
Need credentials? Find them. Command not working? Fix the shell setup.

After watching Opus 4.6 handle 366 tool calls in a single session, deploying across repos, resolving merge conflicts, and querying production logs, I crossed a trust threshold. The agent is smart enough to find what it needs, and this line gives it permission to act on that.

Never take shortcuts that compromise quality.
Lowering test thresholds, skipping validation, disabling checks to make things pass:
fix the actual problem.

This one exists because the agent lowered a coverage threshold to make CI pass. It just changed the number so the check would go green. One HN commenter shared a similar experience and they’ve started migrating instructions to deterministic pre-commit hooks, because the model ignores markdown rules but can’t ignore a failing script.

Never leak context, tooling, or credentials between workspaces.

Workspace isolation stayed because it’s the kind of novel context that actually matters. I have four workspaces with four GitHub accounts and four sets of credentials. Custom tools handle the auth automatically, but the rules still need to make it explicit: always use github_gh/github_git, never raw gh/git in bash. Running the wrong CLI in the wrong directory sends code to the wrong remote.

Chill senior engineer. Confident, direct, slightly irreverent.

Without this you get corporate assistant tone, and you’re reading this thing’s output all day so it’s worth the two lines to make it pleasant.

Skills on demand

I made two utilities (cdt for dev tunnels, shrd for sharing) which were always loaded and I used them maybe twice a week. Now they’re skills that only load when the task matches, so they don’t cost anything until I actually need them.

SkillsBench backs this up: focused, modular skills beat comprehensive documentation. The Agent Skills standard is worth a look too if you want to see how small, targeted context injections work in practice.

I also found that the REST debugging plugin was configured but never actually loaded in the plugin array. It was just sitting there doing nothing, so I fixed that too.

The setup from Building a Fully Agentic Coding Setup still applies. Same tools, same OpenCode, same approach. Just less noise around the signal.