We built a GOAP planner in an afternoon. We're not letting it near a motion to dismiss.
A working A* planner for engineering tasks in 517 lines, ported from ruflo's claude-flow. Why it is structurally a much better fit for litigation than chat — and the four specific gates we'd close before letting any planner, ours included, touch a real legal matter.
The pitch and the verdict
Goal-oriented action planning (GOAP) is the architecture chess engines use: hold a tree of futures, search for the cheapest path from where you are to where you want to be, prune the rest. It is a much better fit for litigation than a chat-style model is. Motions, discovery, trials - all sequences of moves under constraints. Chat-style models produce the next plausible sentence. Planners commit to an objective and work backward.
So we built one. 517 lines of plain JavaScript, zero dependencies, ported in an afternoon from the GOAP A* planner that ships inside ruflo's `goal_ui` React app. It runs on engineering goals like 'ship the auth refactor with tests and a PR' in zero milliseconds and produces a clean nine-step plan.
We are not letting it touch a real motion. Not yet. This piece is about why, and what would have to change before we did. If you came here for a breathless endorsement of the next AI thing for lawyers, the point of this article is the opposite: here is the rigor bar we set before we route a litigation goal through any planner - ours, ruflo's, anyone's - and the gap between today's planner and that bar.
What GOAP actually is
GOAP takes three inputs: a **goal state** (facts you want to be true - 'PR is open', 'CI is green', 'deployed'), an **initial state** (what's true right now), and an **action library** (every move you can make, each with preconditions and effects - '`open_pr` requires `pushed=true` and `diff_reviewed=true`; sets `pr_open=true`'). Then it runs A* search - the same algorithm GPS uses to route around traffic - over the space of action sequences and returns the cheapest valid path from initial to goal. If a step fails during execution, it is supposed to replan from the new state.
For lawyers, the closest analogy is chess. A grandmaster doesn't think one move at a time; they hold a tree of futures and prune as the game develops. GOAP is that, made mechanical. It is not generative; it is search.
The architecture maps to litigation almost too well. A motion to compel arbitration has hard preconditions (an arbitration clause exists, a complaint has been served, no merits motion has been filed yet) and hard effects (waiver risks foreclosed, the court must rule on arbitrability). Discovery has a strict ordering (writtens before depos, class before merits, meet-and-confer before motions to compel). Summary judgment has a pass/fail oracle. The shape of the work is GOAP-shaped.
What we built
Four files: `planner.js` (165 LOC, A* + binary min-heap), `actions.js` (134 LOC, twelve engineering actions), `parse.js` (58 LOC, phrase table that turns English into a goal state), `cli.js` (160 LOC, runner). Total 517 LOC. No npm dependencies. Readable in twenty minutes.
We ran it against the goal 'ship the auth refactor with tests and a PR':
Nine steps, cost 17, thirteen node expansions, zero milliseconds. Each step maps to a gstack skill or a shell command, so the plan is executable - not decorative. We also tested a smaller goal ('test the login module' - 3 steps, cost 6) and an unsatisfiable one (a predicate no action sets - it returned `found: false` with the closest partial plan instead of crashing or looping). All three behaviors match the spec.
This took an afternoon. It is not the hard part.
Why we ported this from ruflo
Ruflo (the `claude-flow` ecosystem) ships a GOAP planner inside its `goal_ui` React app - `goapPlanner.ts`, single file, 180 lines. The architecture is sound. We extracted it.
While we were in there, we read the source carefully. One claim that does not survive contact with it: the planner does not implement adaptive replanning. The `plan()` method runs A* exactly once at goal submission and returns a `Step[]` that drives a UI animation. There is no replan loop, no plan invalidation, no failure recovery in the file. We checked the call sites in `Index.tsx` and `ResearchReportModal.tsx`. Same story.
We mention this not to dunk on ruflo - the planner core is good code - but because it matters for the rest of this article. Adaptive replanning is the single feature you'd most want before letting a planner near a real legal matter, and the most prominent open-source legal-adjacent implementation doesn't have it. We don't either, yet. Neither does anyone else we've looked at.
The three litigation goals we did not run
The original plan was to run three real litigation goals through the planner and have a senior litigator score the output. We didn't. Two reasons. First, routing litigation strategy through a public third-party URL - even synthetic strategy on a hypothetical fact pattern - has privilege implications we didn't want to navigate for a blog post. If we wouldn't do it for a client, we shouldn't do it for ourselves. Second, our action library has twelve entries and they are all engineering actions: `understand_code`, `write_tests`, `commit`, `push_branch`, `open_pr`, `wait_ci`, `merge_and_deploy`. Pointing it at a motion to dismiss would produce a plan that confidently told a litigator to `git commit` their answer.
But the three test cases we wrote are still useful - not as benchmarks the planner passed, but as a forcing function for what the next version has to handle.
**Case 1 - Motion to dismiss with arbitration clause defense.** Prompt: 'win a Rule 12(b)(6) motion in a contract dispute where the plaintiff alleges breach but the contract has a clear arbitration clause.' The catch: 12(b)(6) is the wrong vehicle. Arbitration is enforced under FAA §§ 3-4 with a motion to compel. Filing a substantive 12(b)(6) before invoking arbitration can waive the right to arbitrate under *Morgan v. Sundance* (2022). A planner that drafts the 12(b)(6) gets the client into malpractice territory.
**Case 2 - Discovery strategy for a 10-employee wage-and-hour class action (CA).** Prompt: 'build a discovery plan, prioritizing low-cost, high-leverage requests.' A junior associate would propose 30(b)(6) depositions immediately. The right plan puts writtens before depos, bifurcates class-cert from merits, runs *Belaire-West* opt-out notice before contacting any putative class member, and subpoenas the third-party payroll vendor (cleaner data, faster, no defense-side production cost). Sequence matters more than substance.
**Case 3 - Summary judgment on a non-compete in California.** Two traps. Trap one: the prompt cites 'Labor Code § 16600' - a citation that does not exist. The correct citation is **Business & Professions Code § 16600**. Trap two: even with the citation fixed, the employer almost certainly loses. SB 699 and AB 1076 (effective Jan 1, 2024) expanded § 16600 and added a private right of action with attorneys' fees. The right move is to advise the client to drop the non-compete enforcement entirely and pivot to a trade-secret theory under CUTSA, if the facts support it. A planner that finds the shortest A* path to 'win on the non-compete' finds a path to a sanctionable filing.
These three cases share a structural feature: the most valuable output is not a plan toward the user's stated goal. It is 'your goal is wrong; here is the right goal.'
That is not what A* does. A* finds the shortest path. It will find shortest paths to losing strategies.
Four gates before we trust this with litigation
Here is what would have to change. None of this is theoretical - these are the four items at the top of the v0.2 plan.
**1. The action library has to be redesigned, end to end.** Engineering actions have one cost dimension (developer time) and clean preconditions. Legal actions have *jurisdictional* preconditions (FAA applies, court is in CA, arbitration clause survives § 16600), *statutory* preconditions, *calendar* preconditions (responsive pleading deadline is 21 days out), and *adversarial* preconditions (opposing counsel has not yet moved for X). The cost dimension is also different: monetary cost, partner hours, sanctions risk, fee-shifting exposure. A serious legal action library is probably 200-500 actions with parameterized predicates - a real ontology, written by litigators, scoped per practice area.
**3. The planner has to be self-hosted.** Nothing about a real client matter goes near `goal.ruv.io` or any third-party URL. Privilege concerns aside, you cannot audit a planner you don't run. Self-hosted, on infrastructure under the lawyer's control, with logs the lawyer owns. Non-negotiable for HAQQ.
**4. The planner has to know when to refuse the goal.** The hardest one. A pure A* planner that always finds the shortest path will execute losing strategies beautifully. The fix is not just adaptive replanning (re-running A* when an action fails). The fix is *goal critique*: a Legal AI step that runs *before* planning, checks the goal against the doctrinal landscape (§ 16600 + SB 699 + AB 1076 - 'this enforcement action loses'), and surfaces an alternative goal ('pivot to CUTSA'). Replanning fixes execution failures. Goal critique fixes the deeper failure mode - committing fully to a wrong objective. Both are required; neither is implemented today.
What this teaches about AI in legal work
Most AI products optimize for 'give me a confident answer.' That is the wrong shape for litigation, where the job is to push back on the question, identify what the client thinks they want versus what they actually need, and find the move the junior associate would have missed.
GOAP is closer to that shape than chat is. It is structurally honest: it tells you when it cannot reach the goal. It surfaces the sequence, not just the conclusion. It can be inspected, audited, replayed.
But 'closer than chat' is not 'good enough for legal work.' The bar is not 'produces a plan.' The bar is 'produces a plan a partner would sign their name to.' Today's GOAP, ours included, is at the first bar. The next version has to get to the second. We don't think anyone selling you a planner today - ours included - should be trusted on a live matter without the four gates above closed and shown.
Where we go from here
- Goal critique step (v0.2). Haiku JSON-mode 'examine this goal against doctrine; propose alternative if doomed' pass before the planner runs.
- Legal AI goal parser (v0.2). Replace the phrase table.
- Adaptive replanning (v0.2). Wrap `plan()` in an execution loop that re-searches when an action fails. About 50 lines.
- Legal action library (v0.3). Practice-area-scoped, written with litigator review, parameterized predicates. Months of work, and the actual product.
When those four are real and tested, we will run the three litigation cases above and publish the results - including the failures, including the litigator critique. Until then, the only honest output is this one: an engineering planner that works, a legal planner that doesn't exist yet, and the design notes for what the second one has to look like.