Legal AI Vendor Red Flags: The 45-Point Evaluation Checklist

By Stephane Boghossian · 2026-04-11 · Updated 2026-06-11 · 18 min read · Guides

45 red flags across 8 criteria - security, privacy, robustness, cost - to check before signing any legal AI contract. From the first buyer-led framework.

"Vibe procurement" is the legal tech industry's worst-kept secret. A polished demo, a few buzzwords, and a charismatic sales rep — and suddenly your firm has committed to a six-figure contract for an AI tool that nobody actually evaluated properly.

Key facts

45 concrete red flags across 8 evaluation criteria, extracted from the legal industry's first buyer-led AI evaluation framework (legalbenchmarks.ai).
Rule of thumb: more than 10 red flags in a vendor = a problem; more than 20 = vibe procurement territory.

We recently helped build the legal industry's first buyer-led framework and toolkit for evaluating AI tools for legal teams. From that work, we extracted 45 concrete red flags — the warning signs your team should watch for across 8 core evaluation criteria. Each one is a practical signal you can spot during a demo, in vendor documentation, or in the contract itself.

If you recognise more than a handful of these in a vendor you're evaluating, it might be time to ask harder questions — or walk away.

1. Strategic Fit

Strategic fit is where most evaluations go wrong first. You're not just asking "does this tool do AI?" — you're asking whether it was built for organisations like yours, works with your systems, and serves your jurisdictions.

1.1 Fit to your priority legal work

🚩 Website does not show customers similar to your organisation profile, team size, or industry.
🚩 Case studies and use cases focus on a different buyer profile than yours.

1.2 Fit with your systems and operating model

🚩 Integrations emphasised are not the ones relevant to your environment.
🚩 Ecosystem appears optimised for a different type of company (e.g. startup tools vs enterprise systems).

1.3 Fit with your jurisdictions, languages, and product direction

🚩 Defaults to US or English-only assumptions with weak regional coverage.
🚩 Recently announced product direction or market focus does not match your legal team's likely needs.

2. Functionality

AI demos always look incredible. The real test is what happens when a lawyer uses it on a Monday morning with a 200-page contract scanned from a fax machine in 2019.

2.1 Usable by lawyers with minimal friction

🚩 Interface is cluttered or core actions are hard to find.
🚩 Too many clicks, dropdowns, or manual steps for common workflows.
🚩 Product is not intuitive for a lawyer using it in practice.

2.2 Handles real-world input conditions

🚩 Performs poorly on low-quality scans or dense, heavily formatted agreements.
🚩 Struggles with longer or more complex documents, or multi-document review.
🚩 Cannot reliably handle messy, real-world legal inputs.

3. Robustness

This is the category where the gap between marketing and reality is widest. Robustness is not about whether the AI can produce an answer — it's about whether you can trust it.

3.1 Accurate, complete, and faithful outputs

🚩 Hallucinated content, missed provisions, or silent truncation of outputs.
🚩 Overly confident or sycophantic outputs that do not flag uncertainty or risk.

3.2 Verifiable and independently validated

🚩 Accuracy claims are entirely self-reported with no willingness to support independent verification.
🚩 No published methodology behind performance claims; numbers lack context or test conditions.

3.3 Stable performance in realistic conditions

🚩 Model loses context, contradicts itself, or agrees with clearly wrong assumptions.
🚩 Defaults to the wrong legal context or misses obvious material risks.

4. Security

Security is not a checkbox — it's an architecture question. Any vendor can claim they're "secure." What matters is whether they can explain how, in detail, and back it up with evidence.

4.1 Transparent architecture and data flow

🚩 Explanations are vague, inconsistent, or rely on generic diagrams with no operational detail.
🚩 Missing ISO 27001 or SOC 2 reports, or vendor relies mainly on subprocessor certifications rather than its own.

4.2 Strong access control, isolation, and retrieval boundaries

🚩 Restricted content can leak across users, matters, or workspaces.
🚩 Vendor cannot clearly explain how customer data is separated.

4.3 Safe behaviour under misuse and failure conditions

🚩 Prompt injection succeeds or actions can be triggered without proper approval.
🚩 Vendor lacks credible evidence of security testing and response preparedness.

5. Data Privacy

Data privacy in legal AI is not about GDPR compliance badges on a website. It's about whether the vendor's actual data practices match what they promise — and whether your clients' privileged information is truly protected.

5.1 Contractual limits on data use

🚩 "No training on customer data" appears only in marketing, not in contractual commitments.
🚩 Broad "service improvement" wording still permits use of anonymised or aggregated data.
🚩 Human review rights are unclear or insufficiently limited.

5.2 Deletion and lifecycle control

🚩 No clear mechanism to confirm data has been fully deleted on request.
🚩 Retention periods are indefinite, undefined, or not contractually committed.

5.3 Processing, localisation, and derived-data governance

🚩 Vendor cannot clearly state where data is processed or stored.
🚩 Embeddings and derived data are treated as outside the customer's protection framework.

6. Vendor Risk

Vendor risk goes beyond financial stability. It's about whether you can leave, what happens to your data if the vendor fails, and whether their commitments are enforceable.

6.1 Clear contractual and security commitments

🚩 Rights over outputs are unclear or important safeguards sit only in changeable web documentation.
🚩 Continuity protections are weak or ambiguous.

6.2 Real exit, portability, and accountability

🚩 Playbooks, clause rules, negotiation guidance, or approval logic cannot be extracted if you leave.
🚩 Configuration and institutional knowledge are trapped inside the tool.

6.3 Credible vendor conduct and resilience

🚩 Vendor is opaque, over-claims maturity, or has no credible reference base.
🚩 Cannot show how it would handle disruption or failure.

7. Adoption Support

The best AI tool in the world is worthless if nobody uses it. Adoption support is where you find out whether the vendor is invested in your success — or just in closing the deal.

7.1 Training and onboarding that work for legal users

🚩 Training materials are generic, thin, or outdated (e.g. not updated in 6+ months).
🚩 Limited self-serve resources; onboarding does not reflect legal workflows.

7.2 Responsive support and workable feedback loops

🚩 Support is slow, vague, or sales-led with unclear escalation paths.
🚩 No support coverage in your region, slowing response times and contextual understanding.

7.3 Documentation, change communication, and usage visibility

🚩 Model or system changes happen silently with no advance notice.
🚩 Reporting is too weak to support governance, renewal, or adoption decisions.

8. Cost & Resourcing

Legal AI vendors have learned that the demo sells and the invoice surprises. Cost transparency is not optional — and you need to model total lifecycle cost, not just license fees.

8.1 Transparent pricing that scales sensibly

🚩 Pricing is opaque, cost growth is hard to model, or there are steep step-changes between pilot and rollout.

8.2 Full lifecycle cost is understood

🚩 Important costs appear late: connectors, implementation support, migration, or exit charges.

What to Do Next

If you counted more than 10 red flags in a vendor you're currently evaluating, you have a problem. If you counted more than 20, you may be in "vibe procurement" territory — buying based on enthusiasm rather than evidence.

The good news: every red flag on this list is observable before you sign. You can spot them in demos, in documentation, in contracts, and in the vendor's responses to direct questions. The framework these red flags come from — the Legal AI Evaluation Framework by Legal Benchmarks — provides structured scoring templates and evaluation toolkits to run a proper assessment.

How HAQQ Addresses These Red Flags

We built HAQQ specifically to pass this kind of scrutiny. Multi-jurisdictional coverage across 7 languages. SOC 2 and ISO 27001 certified infrastructure. Full data isolation per workspace. No training on customer data — contractually committed. Transparent architecture documentation. And a legal AI engine (Justinian) purpose-built for the evidentiary demands of legal practice.

We welcome buyer-led evaluation. If your firm is running a structured AI procurement process, we'll participate in any framework-based assessment — including the one these red flags come from.

FAQ

What is vibe procurement in legal tech?

'A polished demo, a few buzzwords, and a charismatic sales rep — and suddenly your firm has committed to a six-figure contract.' The article calls vibe procurement the legal tech industry's worst-kept secret: buying based on enthusiasm rather than evidence.

How do you evaluate a legal AI vendor?

Use the buyer-led framework's 8 criteria: strategic fit, functionality, robustness, security, data privacy, vendor risk, adoption support, and cost & resourcing — with 45 concrete red flags observable 'in demos, in documentation, in contracts, and in the vendor's responses to direct questions' before you sign.

How many red flags are too many?

Per the article: 'If you counted more than 10 red flags in a vendor you're currently evaluating, you have a problem. If you counted more than 20, you may be in vibe procurement territory.'

What are the biggest security red flags in legal AI tools?

The security criterion covers three areas: opaque architecture and data flow ('any vendor can claim they're secure — what matters is whether they can explain how, in detail'), weak access control, isolation and retrieval boundaries, and unsafe behaviour under misuse and failure conditions.

Legal AI Vendor Red Flags: The 45-Point Evaluation Checklist

Key facts

1. Strategic Fit

1.1 Fit to your priority legal work

1.2 Fit with your systems and operating model

1.3 Fit with your jurisdictions, languages, and product direction

2. Functionality

2.1 Usable by lawyers with minimal friction

2.2 Handles real-world input conditions

3. Robustness

3.1 Accurate, complete, and faithful outputs

3.2 Verifiable and independently validated

3.3 Stable performance in realistic conditions

4. Security

4.1 Transparent architecture and data flow

4.2 Strong access control, isolation, and retrieval boundaries

4.3 Safe behaviour under misuse and failure conditions

5. Data Privacy

5.1 Contractual limits on data use

5.2 Deletion and lifecycle control

5.3 Processing, localisation, and derived-data governance

6. Vendor Risk

6.1 Clear contractual and security commitments

6.2 Real exit, portability, and accountability

6.3 Credible vendor conduct and resilience

7. Adoption Support

7.1 Training and onboarding that work for legal users

7.2 Responsive support and workable feedback loops

7.3 Documentation, change communication, and usage visibility

8. Cost & Resourcing

8.1 Transparent pricing that scales sensibly

8.2 Full lifecycle cost is understood

What to Do Next

How HAQQ Addresses These Red Flags

Related reading

FAQ

What is vibe procurement in legal tech?

How do you evaluate a legal AI vendor?

How many red flags are too many?

What are the biggest security red flags in legal AI tools?