Frontend Harness for Development: Empowering Scalable and Efficient Frontend Workflows

Monday, April 27, 2026

There is a quiet, expensive assumption in most "agent + LLM" coding stacks: that the model is the thing producing structure. We treat the LLM as a draftsman, then bolt on linters, schemas, and prompts to coax the draftsman into producing something predictable. When the model is large (Opus, GPT-5, Gemini 3.1 Pro) the gap between "predictable" and "what the model wants to write" is small enough that the bolt-ons mostly work. When the model is small (Qwen3 32B, Gemma3 27B running on a laptop), the gap explodes. The model wants to invent file paths, hallucinate imports, abbreviate components, and generally express its own taste — and most of the time, that taste is wrong.

The conventional fix is to make the model bigger. The interesting fix is to make the work smaller.

This post is about an opencode plugin I just shipped — @elegant/opencode — that takes the second route. It produces a complete, working React + Redux app (the same 128-file tree as the chota-react-redux template) from a one-line spec like "build a todo app", with a 27B local model in the loop, and does so deterministically across 100 consecutive runs. The trick isn't the model. The trick is that the skills impose the structure, and the LLM only fills in the parts that genuinely vary.

The architecture, briefly

The Universal Frontend Architecture is a way of slicing a frontend codebase into thin, named layers — atoms, molecules, organisms, templates, pages, state types/initial/actions/reducer/selectors, store, containers, theme. Each layer has a one-page SKILL.md describing its responsibility, naming convention, file shape, and the invariants it must preserve. The reference implementation in grvpanchal/elegant ships these as 22 markdown files under skills/, plus a chota-react-redux template that is one byte-for-byte valid output of the architecture.

The first version of the plugin treated that template as a fixture: ship the template, edit the variable parts on top of it. It worked, but it was wrong. As the user put it bluntly:

Fixture is a terrible technique. The skills and agents with multiple skills has to impose the structure and not a copy of template.

Right. The skills are the contract. The fixture is just one valid output. So we threw it away.

The pipeline as 21 microtasks

The architecture decomposes into exactly 21 microtasks, each bound to one skill. Some are fixed — given an entity schema, there is exactly one correct output, so we encode that output as a pure JavaScript function. Some are variable — the entity's fields, operations, and shape change what gets emitted, so we delegate those to a small LLM. The split:

Microtask Skill Mode Files emitted
entity-schema LLM (seed; entity object)
ui-theme ui-theme fixed 5
app-shell server-app-shell fixed 9
state-types state-actions fixed 1
state-initial state-reducer fixed 1
state-actions state-actions fixed 2
state-reducer state-reducer fixed 2
state-selectors state-selectors LLM 2
filters-slice state-crud fixed 8
config-slice state-crud fixed 8
state-store state-store fixed 1
atomic-provider server-app-shell fixed 2
ui-base-atoms ui-atom fixed 22
ui-context-atoms ui-atom fixed 18
ui-domain-atom ui-atom LLM 6
ui-skeleton ui-skeleton fixed 9
ui-layout ui-template fixed 4
ui-molecule ui-molecule LLM 12
ui-organism ui-organism LLM 10
container server-container LLM 8
page server-page fixed 3

Fifteen fixed, six variable. The LLM is responsible for 24% of the calls, and only the parts of the codebase that should change with the entity. The other 76% is pure code that has been thought about, written down, and tested once — and runs in zero LLM tokens forever.

This is the central move. A small model is bad at "produce a 128-file React app." It is genuinely good at "produce a TodoList organism that takes {todoData, events} props, renders a <ListSkeleton/> while loading, an <Alert/> on error, and an <AddTodoForm/> plus <TodoItems/> on success." The harness's job is to compress every variable microtask into a prompt that small.

The universal envelope

Every code-emitting microtask — fixed or variable — returns the same JSON shape:

{ "files": { "src/state/todo/todo.selectors.js": "...", "src/state/todo/todo.selectors.test.js": "..." } }

A relPath → source map. Nothing else. The orchestrator merges all 21 maps and writes them verbatim to disk. If two microtasks claim the same path, that's a bug — it throws loudly. There is no "merge logic," no "post-processing pass," no template engine. Each skill owns a disjoint slice of the file tree, and the harness is a faithful pipe.

The simplicity matters. When the agent is asked to produce one of these maps, it cannot get creative about file naming, directory layout, or which test framework to use — because every file path it returns must already be one of the paths the manifest knows about, and the manifest is derived directly from the SKILL.md.

Two-layer validation: Ajv + manifest

Every microtask output passes through two checks before the orchestrator accepts it:

  1. Envelope (Ajv). A tiny JSON Schema (files.schema.json) confirms the output is { files: { <pathlike-string>: <non-empty-string> } }. Any prose preamble, trailing fences, missing braces, or non-string values fail here.
  2. Manifest. A per-microtask JS module (src/file-manifest.js) returns, for any given entity, the exact set of relPaths the microtask must produce, plus a list of structural invariants per file ("must contain export default", "must import useSelector and useDispatch", etc.). The manifest is computed deterministically from the entity schema.

If either layer fails, the orchestrator builds a repair prompt — the original task plus a list of error strings — and asks the agent to re-emit. It does not echo the previous output back. Small models tend to copy-paste their broken output and call it fixed; better to make them produce a fresh attempt against a list of explicit failures. Three attempts max. After that the pipeline halts and surfaces the validation errors to the user.

The manifest doubles as documentation. The agent prompt template includes:

# Required Files
You MUST emit EXACTLY these relPaths — no more, no fewer:
  - src/ui/organisms/TodoList/TodoList.component.jsx
  - src/ui/organisms/TodoList/TodoList.stories.js
  - src/ui/organisms/TodoList/TodoList.test.jsx
  ...

# Structural Invariants
  - src/ui/organisms/TodoList/TodoList.component.jsx must contain: "export default"

The model is not picking paths. It is filling in source for paths the architecture has already committed to.

Context surgery

Every variable subagent receives a context that is, in this order:

  1. Role + universal invariants.
  2. The compacted SKILL.md slice for its skill (Key Principles + Code Patterns sections).
  3. One concrete in-repo exemplar — a single file's worth of code, illustrating the shape.
  4. The exact relPath set + structural invariants from the manifest.
  5. The entity schema.
  6. Only the upstream microtask outputs it depends on, summarized to relPath lists — never full source.
  7. The Ajv response schema.

That last point is the difference between a runnable harness and a thrash. A container agent does not need to read the 12 files the molecules emitted; it needs to know they exist, what they're called, and where they live. So the context-builder summarizes upstream {files: {...}} payloads down to their key sets:

function summariseUpstream(slice) {
  const out = {};
  for (const [k, v] of Object.entries(slice)) {
    if (v && typeof v === "object" && v.files && typeof v.files === "object") {
      out[k] = { files: Object.keys(v.files).sort() };
    } else {
      out[k] = v;
    }
  }
  return out;
}

Twelve organism JSX files become twelve strings. The prompt stays well inside a 32k context window even when running on Gemma3 27B at num_ctx=32768. No history, no stray files, no global codebase context. Subagents run with tools: { read:false, write:false, bash:false, task:false } — they have no escape hatches and nothing to be tempted by.

Adversarial simulation, not happy-path testing

The hardest part of building this kind of harness is convincing yourself it actually works under realistic small-model behavior. Local models do not return clean JSON. They wrap output in code fences, prepend "Sure! Here's the JSON:", drop required files, return empty strings for fields they got bored of, and occasionally invent extra files that shouldn't exist.

So the test harness simulates exactly that. The simulator (test/sim-llm.js) takes the per-skill emitter's canonical output for the canonical entity, then injects realistic failure modes by attempt index:

  • Attempt 1: ~80% chance of noise. Code fences, prose preambles, dropped files, empty content, stray junk files, miscased slices.
  • Attempt 2: ~20% chance of noise.
  • Attempt 3: clean.

The repair loop has to climb from 80% noise to validated output in three attempts. If it can't, the pipeline fails. The stability test is the contract. N=100 runs:

{
  "runs": 100,
  "failures": 0,
  "stableTreeAcrossRuns": true,
  "treeHash": "ac00e9a75dc6860c40031af734ecb04d9d16d68baa83c65dc819289a6dd9d955",
  "llmCalls": 500,
  "detCalls": 1500,
  "llmShare": 0.25,
  "totalRepairAttempts": 870,
  "recoveredAfterRepair": 245,
  "perRunFiles": 128
}

Every run produced the same 128-file tree, byte-identical to the chota-react-redux reference template. The repair loop fired 870 times across the 100 runs and recovered every single one within budget. Zero hard failures.

A second test (test/different-entity.test.js) feeds the pipeline a Comment entity (no toggle operation) and verifies the same skills project onto a different shape — CommentItem atom, CommentList organism, CommentListContainer, commentItems field threaded through every selector. The architecture is generic; only the entity changes.

Why this is the interesting bit

There's a class of problem in agentic codegen where everyone is trying to do the same thing: pick a frontier model, give it the whole repo, hope it threads the needle. The cost is enormous and the failure mode is silent — the model produces something plausible-looking that nobody can be sure is correct.

The skills-driven approach inverts that. The architecture is small enough to write down. The skills are small enough to encode. The variable parts are small enough that a 27B model running on a laptop can do them with temperature: 0.05 and a tightly-scoped prompt. You move the structural decisions out of the model and into the harness, and the model gets to do what it's actually good at: filling in JSX bodies that match a stated contract.

A few specific things fell out of that move that I didn't expect:

  • Naming conventions are stable across runs without anyone enforcing them. When state-types emits CREATE_TODO and state-actions emits createTodo, no LLM had to remember the convention; both came from the same _naming.js helper used by both fixed emitters. The variable agents inherit the same names through the entity schema. There's no source of drift.
  • The reference template is not a goal, it's a side effect. I did not set out to produce a template-identical output. I set out to encode the skills. The byte-identical 128-file tree fell out of the encoding being faithful. That gives me a regression test I can run forever for free.
  • Repair prompts work better when they don't include the previous attempt. Counterintuitive, but: if you echo the model's broken output back, it tends to copy-paste 80% of it and call it fixed. If you just list the failures and ask for a fresh attempt, it actually thinks. This was the single largest jump in repair-loop success rate during development.
  • Schemas alone are not enough. Ajv catches 60% of small-model failures (malformed envelopes). The other 40% — missing files, empty content, missing structural anchors — require a domain-aware manifest. The manifest is the bridge between "the JSON validates" and "the code compiles."

What this is, and what it isn't

This is not a general-purpose code-generation system. It generates one architecture (Universal Frontend Architecture) targeting one stack (React + classic Redux + chota CSS + vitest + Storybook) for one app shape (single-entity CRUD with filters). Adding a new framework means rewriting src/emitters/*.js and updating src/file-manifest.js — the microtask graph, agent prompts, and JSON contract are framework-agnostic, but the bodies of the emitters are not.

What it is: a working argument that for any architecture you can write down, you can produce a small-model harness that emits frontier-quality code from it, deterministically, in zero-history sessions, at a fraction of the cost of frontier inference. The skills do the heavy lifting. The model just paints inside the lines.

If you want to see it run:

git clone https://github.com/grvpanchal/elegant-opencode
cd elegant-opencode && npm install
node scripts/run-todo-demo.js
N=100 node test/stability.test.js

The plugin lives at github.com/grvpanchal/elegant-opencode. The architecture and skills live at github.com/grvpanchal/elegant. They're meant to be read together — the plugin is just one runtime for the skills, and a 27B model is just one consumer of the prompts. The interesting part is the encoding.

No comments: