← Back to HAQQ Blog

The Arabic Legal AI Gap: Not Missing Content, Missing Retrieval

By HAQQ Team · · 8 min read · Mena

We tested Arabic vs English legal search. Arabic returned 9x more primary law — plus dangerous jurisdiction errors. The Arabic legal AI gap is retrieval, not content.

The Experiment

We expected to prove the obvious: that AI and search cover Arabic legal questions far worse than English ones. So we tested it properly. Four legal topics — UAE labour notice periods, Saudi company formation, Lebanese eviction, Egyptian contract breach — each queried once in English and once in Arabic. For every query we counted how many of the returned sources were primary law: actual statutes, court rulings, or government texts, as opposed to blog posts and marketing pages.

We went in expecting Arabic to lose. It did not.

The Surprise

Across the four topics, English returned exactly one primary-law source. Arabic returned nine. Arabic surfaced full Egyptian Commercial Code PDFs, a Court of Cassation ruling, and the Lebanese lease law — primary texts English simply did not reach.

Why did English do so badly? Partly a quirk that is also a lesson: the English query for 'Lebanon eviction' drowned in results about Lebanon, Ohio and Lebanon, Tennessee. The search engine could not disambiguate the place from the country. The Arabic query had no such ambiguity — and went straight to the statute.

The Dangerous Part

If the content exists, where is the gap? In retrieval — and in safety. Two failures stood out, and both are invisible to a non-Arabic reader.

First, jurisdiction contamination. The Arabic query about UAE labour law returned results about Jordanian and Saudi labour law, with no flag that the country was wrong. To a user who cannot read Arabic — or who trusts the answer — that is a silent, confident error about which country's law applies. It is the hallucination problem wearing a more dangerous disguise.

Second, discoverability. Arabic primary law exists, but it lives on bare-IP parliament servers, university file dumps, and loose PDFs — not the clean, indexed pages English law enjoys. The law is there. Nothing is organising it.

What It Means

This changed how we talk about our own product. The pitch is not 'we found Arabic legal content nobody else has.' The content is public. The pitch is that raw availability is not access. A buried PDF on a government server is not usable law until something retrieves the right one, confirms the jurisdiction, and cites it back to you.

That is the layer worth building: retrieval that knows the difference between UAE and Saudi labour law, that prefers a statute to a Facebook post, and that shows its sources. The Arabic legal gap is not a content gap. It is an engineering gap — which is far better news for the region, and exactly the problem HAQQ exists to solve.

Key Takeaways

Sources & Further Reading