← Back to HAQQ Blog

We Ran 3 Parallel Simulations with 72 AI Agents to Predict Legal AI's Future. Here Are the Probability Scores.

By HAQQ Team · · 26 min read · Ai-legal-tech

Using MiroFish, an open-source multi-agent simulation framework, we created 20 agent personas - including 6 adversarial types like a malpractice insurer, a Legal AI VC, and a retired federal judge - and ran 3 independent simulations of 96 rounds each. 1,543 interactions produced cross-validated probability predictions on Harvey's IPO, the first $10M AI malpractice settlement, and BigLaw workforce disruption.

Most Legal AI market reports are written the same way: an analyst reads a stack of vendor press releases, adds a few Gartner citations, and wraps it in a confident forecast. The methodology is structurally biased toward whoever is loudest.

We tried something different. We ran it three times.

We fed three rich source documents — including proprietary HAQQ legal workflow data, detailed persona profiles of 20 Legal AI stakeholders across 9 countries, and a comprehensive industry brief with $1B+ in tracked VC funding data — into MiroFish, an open-source multi-agent social simulation framework built on the OASIS framework with Zep Cloud providing graph-based persistent memory. The system generated 20 distinct agent personas representing BigLaw partners, startup founders, in-house GCs, boutique practitioners, junior associates, legal ops leaders, legal tech investors, academic researchers, a malpractice insurance underwriter, a Legal AI VC, a law school dean, a legal aid director, a retired federal judge, and a pharma General Counsel — across New York, London, Paris, Dubai, Lagos, Bangalore, Singapore, Bucharest, Chicago, and Toronto.

We then ran 3 parallel simulations of 96 rounds each, using Google Gemini 2.0 Flash (1M-token context window via OpenRouter) as the LLM backbone. The three independent runs produced 467, 527, and 549 agent actions respectively — 1,543 total interactions across 72 active agent instances — allowing us to cross-reference predictions for statistical confidence.

This article explains the experiment, translates the findings, and draws out what they mean for anyone building in or buying Legal AI in a market projected to grow from $1.2B to $6.4B by 2030.

How MiroFish Works: The Technical Setup

MiroFish is not a summarization tool or a RAG pipeline. It is a social simulation engine built on the OASIS multi-agent framework, with Zep Cloud providing graph-based persistent memory for each agent. Understanding the architecture matters for interpreting the outputs.

The pipeline ran in five stages:

Stage 1 — Ontology Extraction

We uploaded three source files: a proprietary HAQQ legal document, a 20-persona stakeholder brief (covering geographies from Lagos to Singapore to Bucharest), and a 3,000-word industry intelligence brief with funding data, competitive profiles, and regulatory analysis. MiroFish used Google Gemini 2.0 Flash (via OpenRouter) to extract a typed knowledge ontology with 10 entity types: BigLawPartner, InHouseCounsel, BoutiqueFirmPractitioner, JuniorAssociate, LegalOpsLeader, LegalAIStartupFounder, LegalAIResearcher, StartupFounder, AdversarialExpert, and Organization.

Stage 2 — Knowledge Graph Construction

The ontology was pushed to Zep Cloud\u0027s graph memory system, building a live knowledge graph populated with specific entities: Marcus Chen (partner at a top-tier Wall Street firm), Aisha Okafor (fintech unicorn GC), Tom Nakamura (a legal AI startup founder), David Kowalski (associate at a major global law firm), Rebecca Morrison (Fortune 500 CLO), Victoria Reyes (malpractice underwriter), Michael Osei (Legal AI VC), Patricia Walsh (law school dean), Kofi Agyeman (legal aid director), Marcus Holloway (retired federal judge), Amara Singh (pharma GC), and 9 others. Each entity carries attributes, relationship edges, and embedded context from the source material.

Stage 3 — Agent Profile Generation

From the knowledge graph, MiroFish generated 20 OASIS-compatible agent profiles with distinct backstories, professional opinions, trust networks, and behavioral dispositions. The 6 new adversarial agents were specifically designed to challenge consensus: Victoria Reyes prices AI risk into insurance premiums, Michael Osei evaluates Legal AI startups for investment, Patricia Walsh faces declining law school enrollment while mandating AI curriculum, Kofi Agyeman fights the access-to-justice gap, Marcus Holloway has ruled on AI-generated evidence, and Amara Singh manages hallucination risk in FDA-regulated pharma submissions.

Stage 4 — Multi-Platform Social Simulation (3× Parallel)

The 20 agent personas ran simultaneously across synthetic Twitter and Reddit environments — three independent times. Each run executed 96 simulation rounds, producing 467, 527, and 549 actions respectively. Agents responded to each other, agreed, disagreed, shifted positions, and formed emergent coalitions of opinion. Running 3 parallel simulations on identical seed data allowed us to distinguish robust consensus from stochastic noise.

Stage 5 — Report Synthesis & Cross-Run Validation

A dedicated report agent ran deep-retrieval passes against the knowledge graph and agent memory from all 3 runs, synthesizing a structured prediction report. Predictions that appeared consistently across all 3 runs were flagged as high-confidence consensus. Predictions where runs diverged by more than 15 percentage points were flagged as 'split' — genuinely uncertain outcomes where the agents themselves disagreed.

The 20 Agent Personas

The simulation's strength comes from the diversity and specificity of its agents. Each persona was constructed from real-world archetypes with detailed backstories, professional contexts, and behavioral dispositions:

The 6 Adversarial Agents (New in v3)

These agents were specifically designed to challenge the optimistic consensus that dominated v2:

Finding 1: Geography Is the Most Underappreciated Variable in Legal AI

The most striking output of running a geographically diverse simulation was how sharply geography splits the Legal AI story.

The agent representing Aisha Okafor (fintech unicorn GC, Lagos/San Francisco) — GC of a fintech unicorn operating across 35 African countries, an aggressive AI adopter using HAQQ, Harvey, and custom prompt libraries — consistently flagged that virtually every tool in the market performs poorly on African regulatory frameworks. The simulation's agents converged on this as a systemic training data problem: models trained on US/UK legal corpora simply do not understand Nigerian Company Law, NDPR data protection rules, or the multi-jurisdictional complexity of operating across 35 African countries.

The agent representing Ahmed Al-Rashidi (a major MENA PE fund GC, Dubai) — GC of a $2B MENA private equity fund — surfaced the same dynamic: Arabic language capability is effectively absent from mainstream Legal AI tools, and frameworks like UAE Commercial Companies Law, Saudi Vision 2030 regulations, and Egyptian investment law are not meaningfully represented in any major tool's training data. His agent is currently in conversations with HAQQ about portfolio company deployment.

The Tom Nakamura agent (a legal AI startup, Singapore) — building contract intelligence for Southeast Asia with $4M in seed funding and jurisdiction-specific fine-tuning — was the most pointed: the assumption that Legal AI built for the US market scales globally is not just wrong — it is a strategic error that is actively creating white space for regional competitors in Southeast Asia. He is frustrated that investors assume US tools scale globally — they don't.

Emmanuel Dubois (solo immigration practitioner, Toronto) reinforced the pattern from another angle entirely: as a solo lawyer serving French-speaking West African clients, he has tried multiple AI tools and all fail on immigration law because regulatory changes outpace training data. He uses ChatGPT only for French-language client letters. He represents the underserved long tail of solo practitioners that the industry ignores.

The first vendor to seriously invest in jurisdiction-diverse training data for MENA, Africa, and Southeast Asia isn't entering a niche market. They are entering the majority of the world's legal activity by volume.

Finding 2: The EU AI Act Is Reshaping Product Roadmaps Right Now

The Sophie Laurent agent (Partner, a leading European law firm, Paris) was the simulation's most influential voice on regulation. A partner who does not use AI in client work herself — on principle — she is currently drafting Paris Bar Association guidance on AI usage. Her agent consistently argued that Legal AI vendors are systematically underestimating what EU AI Act compliance actually requires for high-risk AI systems in legal contexts.

The simulation's agents converged on a specific insight: compliance with the EU AI Act is not a future obligation — it is already reshaping product decisions today. Legal AI systems that influence judicial or legal decisions are classified as high-risk and must comply by August 2026.

Harvey and Luminance, per the agents' knowledge graph, are already building explainability and audit trail features specifically to satisfy EU compliance requirements ahead of the August 2026 deadline. This is not being announced loudly in their marketing materials. It is happening in their engineering roadmaps.

The Dr. Lena Fischer agent (a leading European research institute, Munich) — a legal AI researcher who has published benchmarks showing commercial tools significantly outperform humans on routine tasks but underperform on complex legal reasoning — added a critical nuance: vendor accuracy claims are systematically overstated, and EU policymakers she consults are aware of this gap.

The ABA's Formal Opinion 512 — requiring attorney competence, confidentiality protocols, supervision, and candor when using AI in client matters — was referenced repeatedly by the US-based agents as the baseline that every firm is now trying to operationalize.

Finding 3: Big Law Adoption Is Accelerating — But the Learning Curve Is Hollow

The David Kowalski agent (3rd-year associate, a major global law firm) generated some of the simulation's most interesting dynamics. As a power user of Harvey, he provided the inside view that no vendor white paper ever captures.

Kowalski-agent's position: he estimates he completes work in 30–40% of the time it would take without Harvey. His efficiency is real and measurable. But he also flagged something the vendors don't talk about: he feels he is learning less. The foundational work that builds legal judgment in years 1–3 — the close reading of contracts, the pattern recognition from doing the same analysis 50 times — is being compressed or skipped. His firm's summer associate class is already 15% smaller than three years ago.

The Marcus Chen agent (22-year M&A partner, a top-tier Wall Street firm) reinforced this from the partner perspective: deeply skeptical of AI after a hallucinated citation incident, he requires manual verification of all AI output. His firm has a Harvey license, but adoption is inconsistent across practice groups. His primary concern is the junior associate talent pipeline — if associates aren't developing judgment, who will be the partners in 15 years?

Dean Patricia Walsh (a top New York law school) — one of the v3 adversarial agents — added hard data to this concern: bar exam pass rates are down 4.2%, law school enrollment is declining, and she is simultaneously mandating AI curriculum while watching the profession question whether it needs as many lawyers. The consensus across all 3 simulation runs: BigLaw associate classes will be 30%+ smaller by 2028 — probability 55%, not inevitable but trending.

The Sarah Kim agent (partner at a major Silicon Valley firm, Palo Alto) offered the counterpoint: she views AI as capacity expansion, not threat, and refers startup clients to HAQQ for day-to-day legal needs. On advisory boards of two Legal AI companies, she believes the unbundled law firm model is real and that firms embracing it early will thrive.

The emergent consensus from the BigLaw agents: 68% of Am Law 200 firms have at least one Legal AI tool deployed. Adoption is real and accelerating, concentrated in due diligence, contract review, and legal research. Thomson Reuters CoCounsel, Harvey, and Luminance dominate this segment. But adoption is uneven across practice groups and skewed toward associates rather than partners. Meanwhile, 20% of Am Law 200 firms now offer flat-fee alternatives for routine matters — up from 8% in 2021.

Finding 4: Hallucination Liability Is a Product Design Problem

The simulation's agents were remarkably consistent on this point, across all geographies and persona types — and the v3 adversarial agents made it sharper.

The vendors' current approach — prominent disclaimers, 'not legal advice' footers, and contractual liability waivers — is what the James Whitfield agent (founding partner, 6-person IP boutique, London) called 'legal cover, not legal safety.' Whitfield has no AI tools deployed — he is concerned about UK GDPR and client confidentiality on SaaS servers. His perspective represents the sizeable fraction of boutique firms poorly served by enterprise-priced vendors.

Victoria Reyes (a leading malpractice insurer underwriter) — one of the v3 adversarial agents — brought the insurance industry perspective that was entirely missing from v2: malpractice insurers are already pricing AI risk into renewals. Firms without documented AI governance policies face 8–15% premium surcharges. She described the first $10M+ AI malpractice settlement as 'not if, but which case.' The Mata v. Avianca sanctions were $5K. The next one will be orders of magnitude larger.

Amara Singh (GC of a leading pharmaceutical company) added the regulated-industry dimension: in pharmaceutical submissions to the FDA, a single AI hallucination in a regulatory filing can trigger consequences far beyond malpractice — including clinical trial delays, product recalls, and criminal liability. She requires human verification of every AI-generated paragraph in regulatory documents.

The Rebecca Morrison agent (CLO of a Fortune 500 tech company, San Jose) — managing a 120-lawyer global legal team with 18 months of systematic AI deployment across Harvey (M&A diligence), Ironclad (CLM), and a custom fine-tuned model for IP — articulated what enterprise buyers actually want: audit trails as first-class product features. Not logging as an afterthought. A system of record that answers, for any given AI-assisted output: who reviewed it, when, what they changed, and why. Her key insight: ROI takes 12–18 months because change management is the bottleneck, not the technology.

The agents' emergent framework for hallucination risk management converged on three layers:

Finding 5: In-House Teams Are the Real Disruption Vector

This was the finding that most surprised the simulation's agents themselves, in the sense that it emerged from cross-agent debate rather than being stated in any source document.

The conventional Legal AI disruption narrative focuses on law firms: AI makes associates more efficient, compresses billable hours, disrupts the hourly billing model. This is real but slow.

The faster and more structural disruption is happening in in-house legal teams. The Jordan Blake-agent (VP Legal Ops at a Fortune 500 enterprise company) — legal ops for a 200+ lawyer department, who has deployed Ironclad, Harvey, and Luminance — is metric-driven: in 18 months, he reduced outside counsel spend on routine contracts by 60% and eliminated 2 FTE equivalents. He is now focused on AI governance policy.

The Priya Sharma-agent (Head of Legal at a major Indian fintech, Bangalore) — a 31-year-old tech-native running a 3-person legal team — is under a CFO mandate to reduce outside counsel spend by 40% this year. She uses Spellbook and HAQQ. Her key risk: data localization compliance across India, Singapore, and Malaysia.

The Nadia Popescu agent (founder of a B2B SaaS startup, Bucharest) represents the most extreme case: a B2B SaaS founder with no legal background and no in-house counsel who uses HAQQ for all legal infrastructure. She reduced legal time from 2+ days per month to 4 hours, and is a vocal HAQQ advocate in Eastern European founder communities.

Finding 6: Insurance Is the Hidden Enforcement Mechanism

This finding emerged entirely from the v3 adversarial agents and was invisible in our previous simulation runs.

Victoria Reyes (a leading malpractice insurer) revealed that the malpractice insurance industry is not waiting for regulation — it is already acting as a de facto enforcement mechanism. Firms without documented AI governance policies are seeing 8–15% premium surcharges on their professional liability renewals. This is happening now, quietly, without any regulatory mandate.

The mechanism is straightforward: insurers are reviewing whether firms have written AI usage policies, human verification protocols, and audit trails. Firms that cannot demonstrate these controls are classified as higher risk — just as firms without cybersecurity policies were reclassified 5 years ago.

Marcus Holloway (retired federal judge) reinforced this from the judiciary side: courts are increasingly requiring disclosure of AI tool usage in filings. His prediction — mandatory federal AI disclosure for court filings by 2027 — was one of the most contentious across the 3 simulation runs, with probability estimates ranging from 40% (Run A) to 60% (Runs B/C), yielding a 50% consensus marked as 'split.'

Finding 7: The Access-to-Justice Gap Is Widening

Kofi Agyeman (Chicago Legal Aid) captured what no enterprise-focused agent said — and what most Legal AI market reports ignore entirely.

For low-income clients, the choice isn't 'AI vs. a good lawyer.' It's 'AI vs. nothing.' The 80% of Americans who cannot afford legal representation don't care whether Harvey's latest model scores 94% on contract review benchmarks. They need basic access to legal guidance that is accurate enough to be useful and affordable enough to be accessible.

The simulation's cross-run consensus on a Legal AI tool trained on African/MENA/Southeast Asian law raising a Series A by 2026 was just 38% — reflecting genuine capital market hesitation, not lack of need. Michael Osei (a prominent legal tech VC fund) confirmed this: VC investors are focused on enterprise customers with high ARR potential, not legal aid organizations with constrained budgets.

This is the structural blind spot of the Legal AI industry: the market that needs AI tools the most is the market that can afford them the least.

Finding 8: The 2025–2030 Endgame — Three Scenarios

The simulation's agents converged on three plausible scenarios for Legal AI by 2030:

Scenario A: Commoditization Cascades Upward (Most Likely)

Standard NDA review, employment agreements, routine compliance assessments, and first-pass due diligence become commodity functions priced at SaaS rates. By 2027–2028, mid-complexity transactions are also commoditized. Law firms that built their revenue base on billing hours for this work face structural compression.

Scenario B: The Unbundled Law Firm (Probable)

Law firms restructure around a three-tier model: commodity layer (AI-native, SaaS-priced), review layer (human-in-the-loop at flat-fee rates), and strategic layer (complex transactions, premium-priced). Firms that deliberately position in the strategic layer survive and thrive.

Scenario C: Agent-Native Legal Workflows (2028–2030)

By 2028, the leading Legal AI tools are workflow orchestrators. An agent monitors your contract repository, flags renewal dates and change-of-control triggers, drafts the appropriate response, routes it to the right attorney for review, tracks the negotiation, and closes the loop. The winner is whoever has the broadest trusted workflow footprint, not whoever has the best underlying model.

Scenario A is already underway. Scenario B will be mainstream by 2027. Scenario C will separate the category winners from the feature companies by 2030.

Cross-Run Probability Predictions

The agents were given 10 structured prediction questions requiring explicit 0–100 confidence scores. Running 3 independent simulations allowed us to identify which predictions are robust consensus vs. stochastic noise:

The Competitive Landscape — Where Each Vendor Sits

The simulation's agents organically discussed and compared 10 Legal AI vendors across the competitive landscape. Their emergent positioning analysis — mapped by ICP focus (SME vs. Enterprise) and platform ambition (single feature vs. full legal OS) — reveals a striking pattern: the SME + high-platform-ambition quadrant is almost entirely unoccupied.

Harvey dominates enterprise Legal AI with $100M in Series B funding and proprietary legal training data — but its $50k+ customer acquisition cost makes the SME market structurally impossible. Ironclad ($100M Series D) owns mid-market CLM but requires complex implementation. Clio has 150,000 users deeply embedded in small firm workflows, but its AI (Clio Duo) is an add-on, not native. Spellbook competes on SMB contract drafting via MS Word integration, but faces a thin moat against GPT-4 wrappers.

Michael Osei (a prominent legal tech VC fund) — the v3 VC agent — provided the investor lens: 'The companies that raised in 2023 on GPT wrapper narratives are hitting a wall. The next funding cycle will reward defensible data moats and workflow depth, not prompt engineering.' He rated Harvey's acquisition/IPO probability at 65% — 'likely acqui-hired by a legal information giant rather than a standalone IPO.'

The agents identified HAQQ's competitive position as structurally defensible: AI-native at SME pricing, with full-platform ambition. The primary competition for HAQQ is not Harvey or Ironclad — it is Excel spreadsheets and deferred legal work. The weakness is brand awareness, not product positioning.

What This Means for HAQQ

We built HAQQ as a legal OS for the companies that the Legal AI industry's venture capital and go-to-market attention systematically ignores: startups, SMEs, and early-stage founders who need legal infrastructure but can't afford BigLaw rates or enterprise CLM contracts.

The simulation validates four core strategic bets:

Strategic risks the simulation surfaced: geographic concentration in training data, the insurance industry's quiet enforcement of AI governance, and the need for first-class trust infrastructure — audit trails and human review workflows — as competitive differentiators.

Running This Experiment Yourself

MiroFish is open-source. You need an OpenRouter API key, a Zep Cloud account (free tier is sufficient), and about 45 minutes of setup time.

The key variable that determines output quality is not the LLM — it is the richness and diversity of your seed documents. A single PDF produces 2 agents and thin emergent behavior. Three documents with explicit persona diversity produce 20 agent personas and genuine emergent debate. Running 3 parallel simulations costs roughly 3× the API credits but produces dramatically more reliable predictions.

Technical notes: Simulation run on MiroFish v3 · 20 agent personas · 3 parallel runs × 96 rounds each · 72 agent instances · 1,543 total interactions · LLM: Google Gemini 2.0 Flash (1M context window) via OpenRouter · Graph memory: Zep Cloud · Source documents: 3 files · Total simulation runtime: ~45 minutes · Report generation: ~8 minutes