Can AI Plan Litigation? We Built a GOAP Planner to Find Out

By Stephane Boghossian · 2026-05-05 · 9 min read · ai-legal-tech

We built a working A* litigation planner in 517 lines — then refused to use it. The four gates any AI planner must pass before touching a real matter.

The pitch and the verdict

Goal-oriented action planning (GOAP) is the architecture chess engines use: hold a tree of futures, search for the cheapest path from where you are to where you want to be, prune the rest. It is a much better fit for litigation than a chat-style model is. Motions, discovery, trials - all sequences of moves under constraints. Chat-style models produce the next plausible sentence. Planners commit to an objective and work backward.

Key facts

A working A* GOAP planner is 517 lines of plain JavaScript with zero dependencies — ported in an afternoon from ruflo's claude-flow goal_ui.
ruflo's claude-flow GOAP planner does not implement adaptive replanning: plan() runs A* exactly once at goal submission — verified at the call sites in Index.tsx and ResearchReportModal.tsx (EXTERNAL-CITE: ruflo claude-flow source, read directly).
Filing a substantive 12(b)(6) before invoking arbitration can waive the right to arbitrate under Morgan v. Sundance (2022) — a shortest-path planner walks straight into it.

So we built one. 517 lines of plain JavaScript, zero dependencies, ported in an afternoon from the GOAP A* planner that ships inside ruflo's `goal_ui` React app. It runs on engineering goals like 'ship the auth refactor with tests and a PR' in zero milliseconds and produces a clean nine-step plan.

We are not letting it touch a real motion. Not yet. This piece is about why, and what would have to change before we did. If you came here for a breathless endorsement of the next AI thing for lawyers, the point of this article is the opposite: here is the rigor bar we set before we route a litigation goal through any planner - ours, ruflo's, anyone's - and the gap between today's planner and that bar.

What GOAP actually is

GOAP takes three inputs: a goal state (facts you want to be true - 'PR is open', 'CI is green', 'deployed'), an initial state (what's true right now), and an action library (every move you can make, each with preconditions and effects - '`open_pr` requires `pushed=true` and `diff_reviewed=true`; sets `pr_open=true`'). Then it runs A* search - the same algorithm GPS uses to route around traffic - over the space of action sequences and returns the cheapest valid path from initial to goal. If a step fails during execution, it is supposed to replan from the new state.

For lawyers, the closest analogy is chess. A grandmaster doesn't think one move at a time; they hold a tree of futures and prune as the game develops. GOAP is that, made mechanical. It is not generative; it is search.

The architecture maps to litigation almost too well. A motion to compel arbitration has hard preconditions (an arbitration clause exists, a complaint has been served, no merits motion has been filed yet) and hard effects (waiver risks foreclosed, the court must rule on arbitrability). Discovery has a strict ordering (writtens before depos, class before merits, meet-and-confer before motions to compel). Summary judgment has a pass/fail oracle. The shape of the work is GOAP-shaped.

What we built

Four files: `planner.js` (165 LOC, A* + binary min-heap), `actions.js` (134 LOC, twelve engineering actions), `parse.js` (58 LOC, phrase table that turns English into a goal state), `cli.js` (160 LOC, runner). Total 517 LOC. No npm dependencies. Readable in twenty minutes.

We ran it against the goal 'ship the auth refactor with tests and a PR':

Nine steps, cost 17, thirteen node expansions, zero milliseconds. Each step maps to a gstack skill or a shell command, so the plan is executable - not decorative. We also tested a smaller goal ('test the login module' - 3 steps, cost 6) and an unsatisfiable one (a predicate no action sets - it returned `found: false` with the closest partial plan instead of crashing or looping). All three behaviors match the spec.

This took an afternoon. It is not the hard part.

Why we ported this from ruflo

Ruflo (the `claude-flow` ecosystem) ships a GOAP planner inside its `goal_ui` React app - `goapPlanner.ts`, single file, 180 lines. The architecture is sound. We extracted it.

While we were in there, we read the source carefully. One claim that does not survive contact with it: the planner does not implement adaptive replanning. The `plan()` method runs A* exactly once at goal submission and returns a `Step[]` that drives a UI animation. There is no replan loop, no plan invalidation, no failure recovery in the file. We checked the call sites in `Index.tsx` and `ResearchReportModal.tsx`. Same story.

We mention this not to dunk on ruflo - the planner core is good code - but because it matters for the rest of this article. Adaptive replanning is the single feature you'd most want before letting a planner near a real legal matter, and the most prominent open-source legal-adjacent implementation doesn't have it. We don't either, yet. Neither does anyone else we've looked at.

The three litigation goals we did not run

The original plan was to run three real litigation goals through the planner and have a senior litigator score the output. We didn't. Two reasons. First, routing litigation strategy through a public third-party URL - even synthetic strategy on a hypothetical fact pattern - has privilege implications we didn't want to navigate for a blog post. If we wouldn't do it for a client, we shouldn't do it for ourselves. Second, our action library has twelve entries and they are all engineering actions: `understand_code`, `write_tests`, `commit`, `push_branch`, `open_pr`, `wait_ci`, `merge_and_deploy`. Pointing it at a motion to dismiss would produce a plan that confidently told a litigator to `git commit` their answer.

But the three test cases we wrote are still useful - not as benchmarks the planner passed, but as a forcing function for what the next version has to handle.

Case 1 - Motion to dismiss with arbitration clause defense. Prompt: 'win a Rule 12(b)(6) motion in a contract dispute where the plaintiff alleges breach but the contract has a clear arbitration clause.' The catch: 12(b)(6) is the wrong vehicle. Arbitration is enforced under FAA §§ 3-4 with a motion to compel. Filing a substantive 12(b)(6) before invoking arbitration can waive the right to arbitrate under Morgan v. Sundance (2022). A planner that drafts the 12(b)(6) gets the client into malpractice territory.

Case 2 - Discovery strategy for a 10-employee wage-and-hour class action (CA). Prompt: 'build a discovery plan, prioritizing low-cost, high-leverage requests.' A junior associate would propose 30(b)(6) depositions immediately. The right plan puts writtens before depos, bifurcates class-cert from merits, runs Belaire-West opt-out notice before contacting any putative class member, and subpoenas the third-party payroll vendor (cleaner data, faster, no defense-side production cost). Sequence matters more than substance.

Case 3 - Summary judgment on a non-compete in California. Two traps. Trap one: the prompt cites 'Labor Code § 16600' - a citation that does not exist. The correct citation is Business & Professions Code § 16600. Trap two: even with the citation fixed, the employer almost certainly loses. SB 699 and AB 1076 (effective Jan 1, 2024) expanded § 16600 and added a private right of action with attorneys' fees. The right move is to advise the client to drop the non-compete enforcement entirely and pivot to a trade-secret theory under CUTSA, if the facts support it. A planner that finds the shortest A* path to 'win on the non-compete' finds a path to a sanctionable filing.

These three cases share a structural feature: the most valuable output is not a plan toward the user's stated goal. It is 'your goal is wrong; here is the right goal.'

That is not what A does. A finds the shortest path. It will find shortest paths to losing strategies.

Four gates before we trust this with litigation

Here is what would have to change. None of this is theoretical - these are the four items at the top of the v0.2 plan.

1. The action library has to be redesigned, end to end. Engineering actions have one cost dimension (developer time) and clean preconditions. Legal actions have jurisdictional preconditions (FAA applies, court is in CA, arbitration clause survives § 16600), statutory preconditions, calendar preconditions (responsive pleading deadline is 21 days out), and adversarial preconditions (opposing counsel has not yet moved for X). The cost dimension is also different: monetary cost, partner hours, sanctions risk, fee-shifting exposure. A serious legal action library is probably 200-500 actions with parameterized predicates - a real ontology, written by litigators, scoped per practice area.

2. The goal parser has to handle real legal English. `parse.js` is a five-entry phrase table. It maps 'ship the auth refactor' to `{pr_open, ci_green, deployed}` via substring match. It cannot parse 'win a Rule 12(b)(6) motion in a contract dispute where the contract has a clear arbitration clause.' The fix is small - Haiku in JSON mode, schema-constrained to known predicates. About half a day. The smallest of the four gates, and silly without the other three.

3. The planner has to be self-hosted. Nothing about a real client matter goes near `goal.ruv.io` or any third-party URL. Privilege concerns aside, you cannot audit a planner you don't run. Self-hosted, on infrastructure under the lawyer's control, with logs the lawyer owns. Non-negotiable for HAQQ.

4. The planner has to know when to refuse the goal. The hardest one. A pure A planner that always finds the shortest path will execute losing strategies beautifully. The fix is not just adaptive replanning (re-running A when an action fails). The fix is goal critique: a Legal AI step that runs before planning, checks the goal against the doctrinal landscape (§ 16600 + SB 699 + AB 1076 - 'this enforcement action loses'), and surfaces an alternative goal ('pivot to CUTSA'). Replanning fixes execution failures. Goal critique fixes the deeper failure mode - committing fully to a wrong objective. Both are required; neither is implemented today.

What this teaches about AI in legal work

Most AI products optimize for 'give me a confident answer.' That is the wrong shape for litigation, where the job is to push back on the question, identify what the client thinks they want versus what they actually need, and find the move the junior associate would have missed.

GOAP is closer to that shape than chat is. It is structurally honest: it tells you when it cannot reach the goal. It surfaces the sequence, not just the conclusion. It can be inspected, audited, replayed.

But 'closer than chat' is not 'good enough for legal work.' The bar is not 'produces a plan.' The bar is 'produces a plan a partner would sign their name to.' Today's GOAP, ours included, is at the first bar. The next version has to get to the second. We don't think anyone selling you a planner today - ours included - should be trusted on a live matter without the four gates above closed and shown.

Where we go from here

Goal critique step (v0.2). Haiku JSON-mode 'examine this goal against doctrine; propose alternative if doomed' pass before the planner runs.
Legal AI goal parser (v0.2). Replace the phrase table.
Adaptive replanning (v0.2). Wrap `plan()` in an execution loop that re-searches when an action fails. About 50 lines.
Legal action library (v0.3). Practice-area-scoped, written with litigator review, parameterized predicates. Months of work, and the actual product.

When those four are real and tested, we will run the three litigation cases above and publish the results - including the failures, including the litigator critique. Until then, the only honest output is this one: an engineering planner that works, a legal planner that doesn't exist yet, and the design notes for what the second one has to look like.

FAQ

What is GOAP (goal-oriented action planning)?

GOAP takes three inputs — a goal state (facts you want true), an initial state, and an action library where each action has preconditions and effects — then runs A* search over action sequences and returns the cheapest valid path from initial to goal. It is not generative; it is search, the same algorithm family GPS uses to route around traffic.

Can AI plan litigation strategy?

Structurally it's a better fit than chat — motions, discovery, and trials are sequences of moves under constraints, and litigation has GOAP-shaped preconditions and orderings. But not yet in practice: A* finds shortest paths to losing strategies, and in the article's three test cases the most valuable output would have been 'your goal is wrong; here is the right goal' — which is not what A* does.

Why isn't an AI planner safe for real legal matters yet?

Four gates remain open: a real legal action library (200-500 actions with jurisdictional, statutory, calendar, and adversarial preconditions), a goal parser that handles real legal English, self-hosting so nothing touches third-party URLs (privilege), and goal critique — a step that checks the goal against doctrine and refuses doomed objectives before planning starts.

What legal traps would a naive AI planner fall into?

The article's three test cases: filing a substantive 12(b)(6) before moving to compel arbitration can waive the right to arbitrate under Morgan v. Sundance (2022); citing the nonexistent 'Labor Code § 16600' instead of Business & Professions Code § 16600; and pursuing California non-compete enforcement that SB 699 and AB 1076 made a losing, sanctionable path when the right move is pivoting to a CUTSA trade-secret theory.

← All HAQQ articles

This page is best viewed with JavaScript enabled.