Note 01(BRAIN)

The Engineering Brain

Cursor reads files. Augment indexes a codebase. The Engineering Brain is the substrate underneath both — a queryable map of how your project actually works, written as much by the agents shipping code as by the humans deciding what gets shipped.

01Why It Exists(THE WEDGE)

Most products are read-only relationships with your code. The Brain is the first half of a feedback loop.

You ask a question; an embedding-based search returns what it finds. That's the standard pattern. The Brain is different. Every PR an agent merges, every directive a reviewer codifies, every outcome the orchestrator approves — those events feed back into the substrate, distilled into typed factswith citations to the source. The next time someone (or something) asks a question, the answer cites the org's own decisions, not a generic LLM hallucination.

We built it because we kept watching agents make the same mistake twice. A worker would forget the directive that said “no native form controls in packets” and ship a <select> for the third time that month. The directive existed. It was checked into a seed-*.mdfile that everyone could read. The agent just didn't retrieve it at the right moment.

That's a substrate problem, not a model problem. No amount of better prompting fixes a missing retrieval layer.

So we shipped one.

02What It Is(SUBSTRATE)

A SQLite database at ~/.o8/cortex-ide.db containing roughly two thousand typed facts distilled from six sources of project truth.

SourceHow It Gets InAuthorityExample
DirectivesSQL promotion (zero-LLM)1.00Packet metadata rows must use Issues-style clickable rows, not native <select> or <input> controls.
Merged PRsSQL promotion (zero-LLM)0.95PR #958 pinned top-6 facts above other retrieval rows.
Outcomes ledgerSQL promotion (zero-LLM)0.90The orchestrator-empty-state-copy outcome shipped successfully but the reviewer flagged it.
Closed issuesSQL promotion (zero-LLM)0.85Issue #496 closed: refactor to decompose RepoRegistrySection.tsx.
DocsBatched LLM distillation0.85Before every commit, npx tsc --noEmit must be run as a type-check step and pass clean.
CommentsPer-item LLM distillation0.70The MCP webview surface invariants require three-name alignment.

Each fact carries: a kind enum (decision · spec · process · incident · ownership · cross_repo · directive), a confidence score, a source_authority tier, a fingerprint hash for dedup, and a verbatim excerpt back to the parent row.

The point of the typing is not theoretical purity. It's that when a reviewer or an agent asks “who decides what fonts we use?”, the retriever can preferentially weight kind: decision rows over kind: process rows. When a worker is dispatched and asks “is there a precedent for this kind of refactor?”, it can pull kind: processrows scoped to similar repos. The hierarchy isn't decoration. It's how the answer gets shaped before any LLM ever sees it.

03How It's Built(PIPELINE)

Three indexer paths, picked by the shape of the source. Most of the value comes from the SQL one.

SOURCESINDEXERSUBSTRATEdirectivesauthority 1.00merged PRsauthority 0.95outcomes ledgerauthority 0.90closed issuesauthority 0.85docs / *.mdauthority 0.85commentsauthority 0.70SQL promotion0 LLM callsbatched LLM distill8 chunks / callper-item LLM distillconcurrency 4facts tableschema v19facts_ftsBM25 indexfacts_queueresumable jobsFIG. 01 — BRAIN PIPELINEsources of truth → indexer paths → substrate. solid line = primary flow. dashed = trigger or queue dependency. authority value sets retrieval priority.

We started by writing an LLM-distillation pipeline for everything — comments, docs, issues, PRs. It worked. It also took 25 minutes for ~700 comments and would have taken hours for a full project. Worse, it was paying an LLM to extract facts from rows that already were facts. A directive titled “800-line file ceiling — decompose before adding”doesn't need a model to summarize it. The model would just paraphrase it badly.

So we split the pipeline:

  • Directives, outcomes, PR bodies, issue bodies → straight SQL promotion. 551 facts in 0.16 seconds. No model calls, no cost, no quality variance. The composer LLM layers reasoning on top at query time.
  • Long-form prose(comments, docs) → LLM distillation, but batched. Eight chunks per call. The bootstrap cost amortizes 8x. Resumable via a checkpoint table so an interrupted run picks up exactly where it left off.
Most of “your project's knowledge” is already structured. Spend LLM calls only on the parts that aren't.

This is the wedge that makes the Brain economically viable for end-users. A naive “embed everything” indexer costs $5–20 per repo per index. Ours is closer to $0.05, and most of it is free for anyone with a Claude Max subscription because we route through the local CLI.

04How Questions Get Answered(RETRIEVAL)

Classify, retrieve in parallel, compose with hierarchy awareness, cite or recant.

When someone asks “What is the regression-gate threshold for the Q&A eval harness?”— directly through the chat or via the cortex.askMCP tool — the pipeline runs four steps:

  • Classify. A model tags the question as Class A (factual lookup, expects a one-sentence answer) or Class B (reasoning, expects multi-fact synthesis). Same call produces five BM25 query variants for retrieval recall.
  • Retrieve in parallel. Four retrievers run simultaneously: facts (BM25 over facts_fts, top-6 pinned), FTS (BM25 over directives + comments + PRs + issues + docs), SQL (project membership + ownership signals), graph (symbol relationships). Results are RRF-merged into a single ranked list, capped at 30 rows.
  • Compose with hierarchy awareness.The composer LLM gets the top 30 rows including each row's full content (not just a BM25 snippet — that was a bug we caught when the snippet truncated “...drops below 70%; if any category falls below that threshold...” to “...drops below «eval»...” and the LLM said “I don't have that information yet.”). Each row carries its source_authority. The prompt instructs: When two rows contradict, cite the higher-authority one. Directives (1.0) outrank comment opinions (0.7).
  • Cite or recant. The LLM either answers with [CITATION:fact-...] handles inline (translated server-side into pill components), or returns the exact string “I don't have that information yet.” — the no-hedge rule. There's no middle ground. If a row matches, we cite it. If nothing does, we admit it.

End-to-end p50 latency: roughly eight seconds. The slow part is bootstrap of the LLM tier, not retrieval — BM25 over 2k rows is sub-100ms.

05Keeping It Honest(FRESHNESS)

Two mechanisms keep the substrate from drifting into a memory of the loudest voice.

Source-update detection.When a comment, PR, or issue gets edited upstream — a typo fix, a body rewrite — the indexer queue re-enqueues that source. Stale facts get replaced; the substrate never drifts. The seeder script also runs an INSERT OR REPLACE with a freshness check so re-running it is cheap (zero work if nothing changed).

Source-of-truth hierarchy. Every fact carries a source_authorityfrom 0 to 1. Directives are 1.0 — the project's rules, written deliberately, never decay. Merged PRs are 0.95. Closed outcomes 0.9. Closed issues 0.85. Open PRs 0.8. Open issues 0.75. Comments 0.7. The composer is told this hierarchy explicitly. When a directive says “use Plus Jakarta Sans” and an old comment says “we should switch to Inter,” the directive wins.

A memory that never forgets is just as broken as a memory that never learns. The Brain has both: source-update detection so it forgets stale facts, and an authority hierarchy so the noisy ones don't drown out the deliberate ones.
06The Compactor(MAINTENANCE)

A nightly job that keeps signal-to-noise above the line.

NIGHTLY · 03:00 LOCALELAPSED < 1s @ 2K FACTS01GC orphansDELETE fact.source_id missing upstream02drop low-confDELETE confidence < 0.3003exact dedupMERGE same content, +0.05 corroboration04time-decay× 0.90 age > 90d, source untouched05Jaccard mergeMERGE token-set similarity ≥ 0.8506contradictionsREPORT opt-in, no writesFIG. 02 — COMPACTOR JOBSsix maintenance jobs. each idempotent. wrapped in transactions. invoked by launchd cron at 03:00 local; live-tested removing 63 dupes in 0.77s on a 2030-fact substrate.

The substrate would rot without maintenance. Bot-template comments would repeat across issues. Old facts would crowd out new ones. Two facts saying opposite things would both surface in the same answer.

A compactor runs nightly at 3 AM via a launchd job. Six jobs — orphan GC, low-confidence drop, exact-content dedup, time-decay, token-Jaccard near-duplicate merge, and an opt-in contradiction surfacing pass. Live-applied on the founder's substrate, the compactor took 2,030 facts down to 1,967 (63 collapsed paraphrases) in 0.77 seconds. Idempotent. Wrapped in transactions. Re-runnable any time.

07The Economics(COST)

Free in the daily-driver loop. The hosted path is for everyone else.

PathLLMCost (founder daily-driver)
Comment distillationClaude CLIFree (Claude Max)
Doc distillationClaude CLIFree (Claude Max)
SQL promotion (4 sources)noneFree always
User chat &mdash; Class AHaiku CLIFree (Claude Max)
User chat &mdash; Class BSonnet CLI (300s timeout)Free (Claude Max)
Production fallthroughOpenRouterPaid only when CLI tiers fail
Smoke / eval gateSonnet 4.6 OpenRouter~$0.026/run dev cost
CompactornoneFree always

Nothing in the daily-driver loop costs money for a Claude Max user. The economic story for everyone else: bring your own OpenRouter key (beta), or pay us for a hosted plan that aggregates spend with quotas (the long-term path).

We're not selling LLM access. We're selling the substrate that makes any LLM useful for your project specifically.
08The Smoke Gate(SHIP CRITERIA)

Six representative questions, one per category. Pure substring rubric. 6/6 or it doesn't ship.

Every change to the Brain pipeline gets verified against six representative questions, one per category — ownership, decisions, processes, incidents, specs, cross-repo. The harness is a pure substring rubric (no LLM judge, ~50s total) so it's cheap enough to run on every commit. Pass requires 6/6 across at least three runs.

This is the contract: if a change to retrieval, the indexer, or the composer drops the smoke below 6/6, it doesn't ship. We held that line through the entire build. The smoke is what made us catch the FTS-snippet truncation bug, the timeout-poisoning of long comments, and the Haiku CLI hedging behavior that grok-4.1-fast amplified into deterministic refusals.

09What It Isn't (Yet)(ROADMAP)

Things we don't ship in v1. Listed honestly. Each tracked as a GitHub issue.

  • Embeddings. Pure BM25 retrieval today. We catch lexical paraphrases via Jaccard; we miss semantic ones. Issue #962.
  • Live-feed from commits. Facts arrive when the indexer runs, not when you commit. A post-commit hook is queued. Issue #963.
  • A first-class UI. Today the Brain is reachable through the orchestrator and via the chat. A /ask slash-command + a Recall Card sidebar are the next surfaces. Issues #965, #966.
  • Per-user quotas, hosted billing, BYOK. The cost story above only works for the founder today. Multi-tenant economics are a separate epic. Issues #960, #967.
  • A regression dashboard. Smoke is 6 cases. The full eval is 30. Historical tracking with visual regression detection is on the list. Issue #969.

The roadmap is the GitHub issues tagged epic-915. Each one is small, scoped, and has an acceptance criterion.

10Why It Matters(MOAT)

The Brain is the seam where the AI-shipping-code feedback loop closes.

The Brain is one of o8's three load-bearing pieces — the other two are the multi-agent orchestrator and the operator approval surface. Each one alone is interesting. Together they make a product nobody else is building:

  • Editor-grade products have great editors. They do not write back to the codebase's organizational memory.
  • Codebase indexers index codebases. They do not ingest your decision history, your reviewers' rules, or the outcomes of past dispatches.
  • Org-search products federate workplace search. They do not know about your code, and their substrate is built by humans only.
  • Agent-memory products persist memory across sessions. Their unit is the agent, not the project. They don't have a notion of “this org's rules outrank that comment thread.”

o8's wedge is the loop: AI ships code → ledger captures the outcome → Brain absorbs the outcome → next dispatch is smarter. The substrate is the seam where that loop closes. Without it, you have agents that re-litigate the same decisions every conversation. With it, your project gets a memory that compounds with every agent-week of work.

That's why we built it. That's why it's the first thing in /notes.