Orchestrator and Workhorse
Claude orchestrates. Codex builds. Humans review. o8 is the multi-harness control plane — the layer above the agent IDEs, coordinating across the harnesses other tools are.
Cursor and Claude Code are harnesses. o8 is the layer above them.
In a recent piece on the architecture of AI coding tools, Aparna Sinha named the nine components every “harness” implements — context manager, iteration loop, tool router, memory, evaluator, and so on. Cursor is one harness. Claude Code is another. Cline, Aider, and Devin are each their own. Each terminates at one model API; each owns its full loop end-to-end.
o8 implements roughly zero of those nine components. We don't ship our own context manager; the harness already has one. We don't ship our own evaluator; Codex's does that work. We don't reimplement what Cursor and Claude Code already do well. That's the feature. We sit one layer above them and add what no single-harness vendor will build: the coordination, governance, and memory that lives between harnesses, not inside them.
o8 is the multi-harness control plane. It doesn't reimplement the harness layer — it coordinates across runtimes and adds the inter-harness governance, memory, and approval surface that no single-harness vendor will build.
That decision — that the layer between harnesses is the thing worth designing for — is the load-bearing call. It's why o8 ships a multi-runtime adapter system rather than a Claude or a Codex integration. It's why every dispatch flows through the orchestrator instead of straight to a worker. It's why the governance layer exists as a first-class surface and not an afterthought.
Six stations from intent to merge. The sixth feeds back into the first.
The user says “fix the packet metadata flickering on mobile.” What happens next is six handoffs.
- USER → ORCHESTRATOR. Claude reads the relevant code, asks the Brain (
cortex.ask) for any directives that constrain the answer, drafts a plan, and proposes a packet: branch name, scope, runtime to dispatch, model tier. - ORCHESTRATOR → WORKER.The packet spawns a worktree and a worker (Codex / Gemini / opencode). The worker runs end-to-end inside that worktree. The orchestrator does not touch the diff while the worker is working. They're isolated by design.
- WORKER → REVIEW.When the worker finishes, its diff goes into the review pane. The orchestrator reads the diff first — this is a trust layer before the human ever sees it. If the orchestrator finds something off, it reprompts the worker without bothering the operator.
- REVIEW → OPERATOR.Once the orchestrator is satisfied, the operator (you) sees the diff in the inbox. One tap on phone, one click on desktop — approve or reject. This is the only step where a human is required.
- OPERATOR → MERGE. Approval triggers the merge. main moves forward.
- MERGE → LEDGER + BRAIN → ORCHESTRATOR. The session outcome (what shipped, how long it took, how many attempts, what the reviewer flagged) gets logged to the outcomes ledger. The Brain ingests the merge as a new fact. The next dispatch on this codebase will retrieve that outcome as context. The loop closes.
That sixth handoff is the one nobody else builds. It's why your project gets smarter with every dispatch instead of just being worked on by an agent that forgets its history the next session.
One subscription is a single point of failure. Two subscriptions running in parallel is leverage.
The first reason is quality of fit. Claude is excellent at reading code, asking clarifying questions, holding a long-running plan in working memory, and reviewing diffs with taste. Codex (and the GPT-5.4 family generally) is excellent at executing scoped patches quickly across many files in one shot. Asking either to do the other's job degrades both. Routing each to its strength is where most of the apparent “smartness” gain comes from — not in any single model upgrade.
The second reason is compute. If you have a Claude Max sub, you can keep an orchestrator running for hours without burning through quota — orchestration is mostly reading and short structured replies, not generation. Meanwhile, your ChatGPT Plus or Codex sub is doing the actual generation work in the background. You end up with two parallel free pipelines instead of one rate-limited path.
The third reason is independence. When OpenAI ships a model update that changes Codex's behavior, your orchestrator doesn't notice. When Anthropic rate-limits Claude, Codex keeps building. When one provider has an outage, the other still ships. Splitting the control plane across two suppliers makes the whole system more durable than either alone.
Two CLIs running in parallel is leverage you can't buy by paying more for one.
Four CLIs today. Adding a fifth is a six-file patch.
Every runtime ships through a single interface in src/lib/runtimes/types.ts. The same six methods regardless of which CLI is behind them: discover (find owned + discovered sessions), readTranscript, launch, resume, interrupt, reviewDiffs.
The UI never talks to a runtime directly. It always routes through the registry, which fans out the call to whichever adapter applies. discoverAllSessions() runs all four adapters in parallel via Promise.allSettled; routeAction() dispatches resume/interrupt to the correct runtime by session id.
Why this matters: when the next CLI worth integrating ships, we don't rewrite the orchestrator. We add a new file under src/lib/runtimes/, add its literal to the union type, register it, and the rest of the system — mission control, the inbox, the review pane, the mobile inbox — works without modification. We've done it four times. We know the shape.
Today's policy: Claude Code is read-only (it discovers and drives orchestrator sessions but is no longer dispatched as a worker because of fan-out reliability data from April 2026 — see issue #650). Native Claude sub-agents stay available inline via the Agent tool. Codex, Gemini, and opencode are the three dispatchable workhorses.
A session you started is one you can interrupt. A session you found is one you can read.
The runtime adapter draws a hard line between two kinds of sessions. Ownedsessions are ones o8 spawned itself — through the dispatch flow, with a worktree, a packet, a tracked process id. The orchestrator can interrupt them, resume them, kill them, replay them.
Discoveredsessions are ones the user started themselves in their terminal — a Claude Code run they kicked off in a tmux pane, a Codex thread they're working in via the CLI directly. o8 finds those sessions on disk and shows them in the inventory but only in read-only mode. We can show you the transcript, surface the live status, even let the orchestrator reference its plan — but we don't reach in and steer it. That's the user's session. Touching it would be rude.
Owned sessions get full control. Discovered sessions get read-only respect. The line stays clean because crossing it turns o8 from an operator surface into a babysitter that breaks your other tools.
Before any agent merge, the orchestrator reads the diff. That's the trust layer before GitHub bots.
A pull request from an autonomous worker is not a pull request from a human. The reviewer can't assume the worker understood the constraints. The reviewer can't tell at a glance whether the diff is one chunk of clean work or three chunks of clean work bolted onto a fourth chunk of speculative rewrite. Human review is built for human authors.
o8's answer: the orchestrator reads the diff first. It fetches the directives that constrain the change, the recent outcomes that touched adjacent code, the symbol graph for the files involved — and then it judges whether the worker satisfied the packet. If anything's off, the orchestrator reprompts the worker, often inside the same packet. If the diff is clean, the orchestrator advances it to the operator inbox with a one-line summary.
This is the layer that makes autonomy comfortable. You're not approving raw diffs from agents you barely watched. You're approving diffs that have already been read by Claude with the full context of what was supposed to happen. The operator decision becomes binary: ship it or send it back. The cognitive load is bounded.
The trust layer isn't the operator. The operator is the ratification step. The trust layer is Claude reading the diff before the human ever sees it.
If you have a Claude Max sub, the orchestrator is free. We will not build a product that costs the user money to run a daily loop.
Every runtime in o8 talks to its model through the model's official CLI — not its API. Claude Code through claude. Codex through codex exec. Gemini through gemini --yolo. opencode through opencode run.
This is a deliberate cost decision. CLIs ride the user's own subscription — Claude Max for Claude, ChatGPT Plus for Codex, Google AI Studio for Gemini. The user pays the model provider directly, not us, and not on top of us. We don't markup their LLM bill, we don't proxy their requests through our infrastructure, and a power user with a Claude Max sub can run o8 all day at exactly the price of that one subscription.
What we lose in the API-vs-CLI trade: streaming latency on the first token (CLI bootstrap is 6–90s depending on the provider), structured outputs (we parse JSON out of streamed text instead of getting it natively), and some advanced features that only ship through the API. We've worked around all of these. The CLI bootstrap is amortized by long-running sessions; structured parsing is reliable enough to power the indexer; advanced features we don't use yet aren't a tradeoff.
What we gain: a product that costs you nothing to run if you already have the subscriptions. That's the wedge that lets power users adopt o8 without a procurement decision.
Each is a harness. We sit above them.
Every tool below is a harness in Aparna Sinha's sense — it terminates at one model API and owns its full loop end-to-end. We don't compete with their loops. We coordinate across them.
| Harness | What it owns end-to-end | Where it leaves o8 |
|---|---|---|
| Editor-grade IDE | Context manager + iteration loop + tool router around one model. | We don't build an editor. We coordinate across editors when more than one model needs to ship a feature. |
| Codebase indexers | Index the codebase semantically. Surface relevant files at request time. | We index the decisions, not the code. The Brain is downstream of the codebase, not over it. |
| Autonomous agents | One model, one long-running session, ship a feature end-to-end. | We split the loop. Operator approval is a feature, not a bottleneck. Orchestrator review catches the misses. |
| CLI assistants | Power users live in the terminal. Don't fight that. | We agree. We just let the orchestrator manage many CLIs at once instead of one at a time. |
| Cloud-only agents | Agents run on a remote VM. The user reviews via web UI. | We run locally. Worktrees on the user's machine. The repo never leaves their disk. |
| Org-search products | Federate workplace search. Index Slack, Notion, Drive. | Different problem. We index your engineering decisions. Theirs is a substrate of human-written documents; ours is written by humans and AI together. |
This isn't exhaustive. It's the shape. Every harness above made a bet that one of the bets we declined was the better one. The thing none of them built — and none of them willbuild, because each is committed to one model and one loop — is the seam between harnesses, with a governance layer in the middle and a shared memory underneath. That's the gap we're shipping into.
Where the orchestrator lives, where the worker lives, where the operator lives.
o8 has three surfaces, each tuned to a different role:
- Desktop IDE. Native Tauri app. Where the orchestrator does its planning and review work. Where the workers are watched in real time. Where directives get edited. This is the thinking surface.
- Mobile. A remote control, not a scaled-down desktop. The inbox is the only screen that matters. Push notifications when an operator decision is needed. One tap to approve, one tap to reject, one tap to ask the orchestrator a question. This is the ratification surface.
- MCP server.The orchestrator can be driven from outside o8 entirely — from Claude Desktop, from a terminal Claude Code session, from any agent that speaks MCP. Every operator action exposed by the desktop is exposed as an MCP tool. This is the platform surface, and it's why o8 isn't a closed system.
The MCP surface is the one that's easiest to underestimate. It means a third-party agent on a model we don't even support can drive an o8 session, dispatch packets, and approve merges — through the same governance layer that protects everything else. The control plane accepts other harnesses as input. That's the long-term position.
The orchestrator is shipping. The next bets are about routing, learning, and reach.
- Learned dispatch routing. Today the orchestrator picks a runtime by hand or by simple heuristic. The substrate of session outcomes is rich enough now that the orchestrator can learn which runtime/model wins on which kinds of packets. Issue #518 in the Cursor parity epic.
- Cloud worker fan-out.Local worktrees are the default and will stay so. But for big sweeps — one packet across fifty repos — we'll add a cloud worker adapter that runs the same six-method contract on a remote VM and reports back through the same review pane. Issue #514.
- A custom-tuned worker.Open question, not a committed bet: a model fine-tuned on the orchestrator-worker handoff, trained on the outcome ledger of every dispatch we see. Could materially raise the floor. Could be obsoleted by the next general-purpose Codex update. We're watching.
- Composer-style worker. Some tasks want agentic editing inside an existing IDE rather than a worktree. An adapter that drives an in-IDE worker through the same governance pipeline is on the longer-term board.
The throughline: every one of those bets is on the layer above the harnesses, not on the harnesses themselves. We don't need to win the model fight, and we don't need to win the harness fight. We need the control plane to be the right shape for the loop you're actually running across them.