Mental Model to build AI harnesses

Friday, May 29, 2026

A useful mental model for building AI harnesses is to think of the model as the “engine” and the harness as the operating environment that decides what the model sees, what it can do, how it is checked, and when it stops.[2][6] In practice, harness design is about turning a general model into a reliable system by managing context, tools, state, permissions, validation, and feedback loops.[2][3]

What an AI harness is

A harness is the software scaffold around the model: it assembles prompts, exposes tools, tracks file or workspace state, applies edits, runs commands, manages permissions, caches stable prefixes, stores memory, and controls the loop of action and feedback.[2] Martin Fowler describes this as a mental model for building trust in coding agents, while Sebastian Raschka frames an agent harness as a control loop around the model that decides what to inspect next, which tools to call, how to update state, and when to stop.[5][2]

A concise way to distinguish the pieces is:

  • LLM: the raw model.[2]
  • Agent: the loop that uses a model plus tools, memory, and environment feedback.[2]
  • Agent harness: the software scaffold that manages context, tool use, prompts, state, and control flow.[2]
  • Coding harness: a task-specific agent harness for software engineering work.[2][4]

The core mental model

The best mental model is: the harness is not just “wrapper code,” it is the system that shapes model behavior.[6][5] That means most reliability gains come less from changing the model and more from changing the environment around it: what instructions are visible at each step, what tools are available, how much context is loaded, what constraints are enforced, and how mistakes are detected and corrected.[2][3][7]

This leads to four practical principles:

  • Progressive disclosure: do not dump everything into the prompt at once; reveal the right instructions and context at the right time.[1][3]
  • Deterministic guardrails: use validation, hooks, and permissions to keep the agent from taking unsafe or low-quality actions.[3][7]
  • Statefulness with boundaries: preserve useful memory, but limit noisy context so the model can reason effectively.[2][4]
  • Closed-loop correction: when the agent makes a mistake, add a harness mechanism that prevents that mistake from recurring.[3][5]

Why the harness matters more than the prompt alone

A model can only act on the context it receives, and coding agents are especially sensitive to context overload.[1][2] Harness engineering emphasizes making the task legible to the model in a form that fits its context window and decision process.[1][3] In other words, the harness should surface the right instructions at the right moment rather than front-loading every requirement up front.[1]

This is why long-running or multi-step work needs more than a good prompt. Anthropic’s guidance on long-running application development highlights the importance of design choices that support sustained execution, feedback, and reliability over time, not just one-shot generation.[7]

The main building blocks of a good harness

A practical harness usually includes these components:

  • Context assembly: gather repo state, relevant files, prior decisions, and task instructions.[2]
  • Prompt shaping: keep a stable prefix, then add task-specific context and memory as needed.[2]
  • Tooling layer: expose file edits, searches, tests, commands, and other operations through structured tools.[2][3]
  • Permissions and approvals: constrain risky actions and require confirmation where needed.[3][6]
  • Validation: run tests, linting, structural checks, or other evaluators before accepting output.[1][3]
  • Memory and resumption: store useful session state so work can continue without reloading everything.[2][4]
  • Observability: log actions, tool calls, failures, and outcomes so failures can be diagnosed and harness rules improved.[3][8]

A useful design pattern: instruction timing

One of the most important harness ideas is timing. Ryan Lopopolo’s talk, as summarized in the provided transcript, emphasizes that a good harness gives the model “text at the right time” and avoids overwhelming it with all instructions at once.[1] The model should be allowed to explore and prototype first, then receive stricter constraints at lint, test, or review time.[1]

That pattern often looks like this:

  • Early phase: let the agent explore the task and form a draft solution.[1]
  • Middle phase: supply task-relevant constraints, style rules, and structural preferences.[1]
  • Late phase: enforce quality gates with tests, lint rules, or review checks.[1]

This is especially effective when you want the final output to satisfy non-functional requirements such as code organization, modularity, statelessness, or repository conventions.[1]

How harnesses improve reliability

Harnesses improve reliability by turning vague expectations into enforceable system behavior.[1][5] Instead of hoping the model “remembers” a convention, the harness can reintroduce the rule when needed, verify whether the rule was followed, and block or rewrite output that violates it.[1][3]

HumanLayer’s article summarizes this as engineering the agent so that when it makes a mistake, you fix the system so it does not make that mistake again.[3] That idea is central to harness design: the harness should learn from failures and make the correct behavior easier and the incorrect behavior harder.[3][5]

Mental model for building one

If you are designing an AI harness, use this sequence:

  1. Define the job clearly: what the model should produce, what counts as success, and what must never happen.[1][3]
  2. Choose the smallest useful context: include only the repo state, memory, and instructions needed for the current step.[2][4]
  3. Expose tools deliberately: give the model exactly the actions it needs, not a raw shell unless that is necessary.[2][3]
  4. Add guardrails: permissions, approvals, validators, and tests should catch harmful or low-quality actions.[3][7]
  5. Separate exploration from enforcement: let the model draft, then use stricter checks before finalizing.[1]
  6. Record failures and refine the harness: convert recurring mistakes into new constraints, checks, or prompts.[3][5]

Common failure modes the harness should prevent

A good harness anticipates predictable failures such as:

  • Context overflow: too much irrelevant information crowding out useful reasoning.[2][4]
  • Instruction drift: the model ignores conventions because they were only stated once at the beginning.[1][3]
  • Unsafe actions: the model edits or runs something it should not.[3][6]
  • Unverified outputs: code that looks plausible but fails tests or violates structure.[1][7]
  • State loss: the agent forgets prior work when tasks span multiple steps or sessions.[2][4]

The simplest version of the mental model

If you want a one-sentence model, use this:

An AI harness is the system that turns a general model into a dependable worker by controlling context, tools, state, constraints, and feedback.[2][3][6]

No comments: