Free Resource
OpenAI's forward-deployed engineers spent six months embedded inside Thrive Holdings' 30-firm Crete network to build a 97% self-improving tax agent. You cannot replicate that at any price. But the encoding loop the agent runs on sits one layer down the stack — and it fits in your sprint. This kit is the toolkit your AI Champion uses to stand up the first loop in 90 days.
The Problem
Three pieces of what Thrive built sit outside any 10-person firm's reach. The 7,000-return training corpus assembled across 30 firms. Six months of embedded OpenAI engineers writing evaluation suites and code modifications. A Codex-written self-improving update mechanism. Chasing the cheap version of that stack is the fastest way to waste a quarter.
The mechanism Crete's flywheel runs on is structural, not technological. Corrections become captured production traces, traces become evaluations, evaluations become an updated agent, and the cycle repeats. At your scale, the same loop runs one layer down: corrections become captured rows in a shared sheet, the sheet becomes encoded firm methodology in Claude skills, prompts, and SOPs, and the encoded methodology runs against commodity AI that gets meaningfully better every month on the work you actually do. Discipline, not capital — and that's a fight your firm can pick.
What's Inside
Everything your Champion needs to stand up the first encoding loop and finish the quarter with rules competitors can't copy.
A branded diagram placing the Crete/Thrive vendor-scale loop next to the firm-scale loop on the same architecture. Use it internally to explain that you're running the same shape at a different layer of the stack.
Three working sheets — the Correction Capture Log with Pattern key + Times seen auto-counting; the Vertical Narrowing Worksheet (volume × similarity scoring); the Encoding Backlog with status tracking. Multi-client. Pre-populated with worked CAS examples.
A one-page role definition framing the Champion as the firm's forward-deployed engineer — the OpenAI Deployment Company / Tomoro / PCAOB / PwC institutional pattern at firm scale. Usable as an internal hiring spec or role-clarification document.
Three 30-day phases with weekly checkpoints and numeric exit criteria. Phase 1 capture, Phase 2 encode the first three, Phase 3 measure and defend the role. Explicit kill criteria so a vertical that isn't producing encodable signal doesn't quietly absorb a year.
Binds the four artifacts together. Loop explainer, the four-step run guide, capture discipline, encoding placements (Claude skill / prompt / context file / SOP), measurement, and a full cross-reference to the QC Starter Kit so the two kits work as one operating system.
The Champion role spec and 90-day roadmap are also included as plain markdown — drop them straight into your firm's wiki, edit them in place, and version them with your other operating docs.
How It Works
Score 3–5 candidate workflows on frequency, repetition across clients, and pass/fail clarity. The highest scorer is your first loop. The instinct to encode the whole firm is the wrong instinct — Crete's 7,000 returns weren't 7,000 different things, they were 1040s and 1041s, repeated.
Every correction logged in real time, tagged with a short Pattern key. The biggest failure mode is "I'll log it later" — you won't. Be specific in "Why the AI was wrong" and "Firm-specific rule it missed." Generic categories don't produce encoded rules.
One-offs stay in the log as evidence. Recurring catches (the auto-counted Times seen column) earn an encoded rule — a Claude skill, a prompt template, a client context file, or a referenced SOP. The senior on the vertical signs off before each rule ships. This is the moat.
Two numbers: corrections-per-job (leading indicator — trends down as rules accumulate) and time-on-job (lagging indicator — follows by 2–4 weeks). Both flat after 6 weeks = kill criteria triggered. Both trending down = open a second vertical in Q2.
Two halves of one operating system
If you're running the QC Starter Kit, each catch in its Correction Log becomes a candidate row in this kit's Correction Capture Log. QC Kit = Layer 6 (catching errors review by review). This kit = Layer 4 (turning those catches into encoded methodology that makes the same correction stop recurring). Same loop, different surface.
Stay Current
Subscribe to The AI Accountant newsletter and get the Encoding Loop Starter Kit delivered to your inbox, along with weekly analysis of the AI developments that matter for your practice.
Your Move
A firm running its own encoding loop builds a moat that thickens every month. A firm waiting for the vendor-built agent generates the training data the vendor will eventually sell back to it. The difference in eighteen months isn't a benchmark point on a model release page — it's whether the work your team does still belongs to your firm.