A motion-to-dismiss for $1.67 (or $4.55 if you're sloppy). The 2.7x lever isn't the model.
Same workflow, same output, three routing strategies. The bill changes by almost three times. Where the money actually goes inside an AI-drafted brief, why cite-checking is 1.6% of it, and what the math means for how firms should price AI-augmented work.
Most firms running AI today are paying the all-Opus bill and never see the $1.67 version, because nobody wired the router.
This piece is about where the money actually goes inside an AI-drafted brief, why the cite-check stage is 1.6% of the bill (and should stay there), and what the math means for how firms ought to price AI work to clients.
What we measured (and what we didn't)
Honest disclosure first, because the numbers are load-bearing.
These figures are not from instrumenting a live customer matter. They come from a small open harness - a thin wrapper around the LLM SDK that logs per-call token counts, model, stage label, and computed cost - driven against three *modeled* legal scenarios:
- Motion-to-dismiss (research, draft, cite-check, revise)
- NDA review (single contract pass)
- Discovery first-pass (100 documents, relevance + privilege flag)
We're modeling rather than measuring because we're still wiring instrumentation into production. That's coming. For now: the model prices are real (verified against vendor pricing pages on 2026-05-05). The cost structure is real. The exact token volumes are realistic estimates based on what 15-page motions, mid-sized research memos, and one-paragraph cite-check verdicts look like.
The lever this article describes - model routing - does not depend on the exact volume. It depends on the cost ratios across stages. Halve the volumes and the picture is the same. Triple them and it is the same. The conclusion holds regardless.
The motion-to-dismiss bill, line by line
The harness ran 16 calls across four stages. Total bill: **$1.6742**.
Two findings here that should reorganize how you think about your AI budget.
**First: the draft is 63% of the entire bill.** One call. Long context in (research memos plus the firm's template plus the client's facts), long output (a 15-page motion). That single Opus invocation costs more than the other 15 calls combined. If you want to make this workflow cheaper, you have exactly one place worth optimizing - and it's the place you can't downshift without quality consequences.
**Second: cite-checking is 1.6% of the bill.** Eight calls, $0.0272. Verifying citations against retrieved authorities is mechanical, structured, repeatable, and easy to verify after the fact. Haiku does it well. Running the same eight calls on Opus would multiply this stage's cost by roughly 16x for output that, in our testing, is no better.
Most firms reading this are doing the opposite. They picked one model for everything - usually Opus, because 'use the best model' is the default story when nobody on the team is doing the cost math. They are paying Opus rates for cite-checking. That is an unforced error.
The 2.7x lever
Same workflow. Same prompts. Same input token counts. Three routing strategies.
**All-Opus is 2.7x the mixed bill.** This is the default for most 'we use AI' firms today. Pick one model, plug it into every stage, ship. The result is a 171.5% premium over a routed pipeline that produces functionally identical output.
**All-Haiku is 6.9x cheaper than the mixed bill - and it is wrong.** Haiku, asked to draft a 15-page federal motion, will hallucinate citations. It will miss subtle pleading-standard distinctions. It is a great cite-checker and a bad first-chair. The cost-obsessed engineer reaches for all-Haiku and ships malpractice exposure. Don't do this.
**The cite-check downshift is asymmetric.** Moving cite-check from Haiku to Sonnet costs you 4.5%. Moving it to Opus costs you 28.8%. You are paying Opus prices for a task that has near-perfect ground truth (does the citation exist? does the pinpoint match?). There is no premium model lift to capture there.
The middle path is not a compromise. It's the rational architecture. Opus where it has to be - generative reasoning under latitude. Sonnet where you need solid mid-tier judgment - research synthesis, focused revisions. Haiku where the task is structured and verifiable - citation lookup, relevance verdicts, classification.
Where Haiku is good enough (and where it isn't)
The temptation after seeing those numbers is to push everything down a tier. Resist it. The routing decision is per-stage, per-task - and it requires actual judgment about what the model is being asked to do.
**Haiku is good enough for** citation verification against a retrieved authority, first-pass relevance review on a discovery batch, privilege flagging triage, structured extraction (parties, effective date, governing law), and intake classification.
**Haiku is not good enough for** drafting argumentative prose under a pleading standard, synthesizing case law into a strategic memo, spotting the issue that *isn't* in the rubric, or anything where the failure mode is 'plausible-looking nonsense.'
**Sonnet is the right default for** research synthesis (read five cases, return one focused memo), focused revisions, and mid-stakes contract review where Haiku is too thin and Opus is overkill.
**Opus earns its rate on** the first draft of the actual brief, novel issue-spotting in unfamiliar fact patterns, and the handful of tasks where you genuinely cannot afford 'close to right.'
The other scenarios - different unit economics
The motion-to-dismiss is the marquee number. The other two scenarios show how badly 'average AI cost per matter' misleads you.
**Single NDA review.** One Sonnet call. 5,000 input tokens, 800 output. Cost: **$0.027.** Two and a half cents. A firm doing 200 of these a month is spending $5.40 in inference for a workstream that previously consumed paralegal and associate hours.
**100-document discovery first-pass.** 100 Haiku calls, ~4,000 input each, ~150 output. Cost: **$0.38.** Less than a coffee for what used to be a half-day of contract-attorney triage. At a thousand documents this is $3.80. At ten thousand, $38.
The deep-work scenario (motion) and the high-volume triage scenarios (NDA, discovery) live in different cost neighborhoods. A '$1 per matter' pricing card does not work, because $1 is 60% margin on a discovery batch and a 40% loss on a contested motion. You need per-output pricing tiers that match the underlying inference shape.
What this means for legal pricing
The hourly model assumes the lawyer's time is the cost. AI inverts that. The cost of producing a competent first draft drops by three orders of magnitude. The cost of reviewing it stays the same.
That breaks hourly billing as a pricing mechanism for AI-augmented work. Not as a unit of internal accounting - capacity planning still needs hours - but as the thing on the client's invoice.
What survives:
- Flat fee per matter type. $499 for a motion-to-dismiss. $99 for an NDA review. $1,499 for a diligence pack. Margin floor is the AI cost (small). Margin ceiling is the client's willingness to pay (much larger). Both ends are knowable.
- Per-seat subscription. Firm pays per attorney per month. Inference cost is amortized across the whole roster.
- Outcome / contingency. Lower base, success fee. AI cost is a rounding error against an outcome.
What dies: billing the client by the hour for a brief the model wrote in 90 seconds. Not because clients won't pay it - sophisticated GCs already won't - but because a competitor will quote the same brief at a flat fee that beats your hourly bill by 50% and they will still keep 95% margin.
What firms should do this week
If you take one thing from this piece, take an instrumentation habit. The fix is small.
- Wrap every AI call with a tracker. Stage label, model, input tokens, output tokens, cost, latency, timestamp. JSONL, one record per line. Twenty lines of code. You cannot route what you do not measure.
- Default cite-check to Haiku, drafting to Opus, research and revisions to Sonnet. This is the single highest-leverage configuration change in the stack. Audit it monthly.
- Quote AI-augmented matters as fixed fee, not hourly.
- Publish your model routing as a transparency artifact. Clients will start asking which model touched their work.
- Recompute monthly. Vendor prices move. New model generations land.
What's coming next from us
The numbers in this piece are modeled. They are realistic, the cost structure is real, and the routing finding is robust. But 'modeled' is not 'measured.' We are wiring the same tracker into HAQQ Legal AI in production this month. In T+30 days we'll publish the follow-up: same scenarios, real customer matters, anonymized logs, full per-stage and per-model breakdown.
If the numbers come in materially different from these models, that is itself the most useful thing we can publish. Either way it'll be honest.