Legal

AI in the Legal Industry: The Trust Problem, the Real Risks, and What Is Actually Safe to Use Today

Anja Prosch Mar 20, 2026 10 min read

When a Federal Court Brief Cites Cases That Do Not Exist

In 2023, a New York attorney submitted a court brief containing case citations that looked entirely credible — complete with case numbers, judge names, and court references. Opposing counsel could not locate them. The cases did not exist.

The attorney had used ChatGPT to draft the brief and had not verified the citations. The court sanctioned him. The case became known as Mata v. Avianca — the first high-profile signal that AI in legal practice carried real operational liability, not just theoretical risk.

By mid-2025, French researcher Damien Charlotin’s tracking database had recorded over 600 documented AI hallucination cases in court filings, involving 128 lawyers across jurisdictions. The rate had increased from roughly two incidents per week in 2023 to two or three per day by mid-2025.

In the first two weeks of August 2025 alone, three separate federal courts sanctioned lawyers for AI-generated hallucinations. In one of those cases, the attorney had used a recognized legal AI research platform — not a consumer chatbot — and still submitted fabricated citations.

This is the factual starting point for any honest conversation about AI in law.

AI Hallucination in Legal Context: A Precise Definition

A hallucination, in large language models (LLMs), is output that is internally coherent and superficially plausible but factually wrong or entirely fabricated. In legal practice, this means:

Case citations to cases that do not exist
Quotations attributed to real judges that were never written
References to statutes with incorrect provisions or incorrect application
Misstatements of procedural rules in specific jurisdictions

A 2024 paper co-authored by researchers at Yale and Stanford — Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models — found that general-purpose LLMs hallucinate legal citations at rates far higher than acceptable for responsible legal practice.

The problem is structural. LLMs are trained to generate plausible text, not to verify facts against external databases. When asked for case law, a model that has not retrieved a real source will generate something that looks like a citation — because that is what the pattern demands. This applies equally to AI agents built on general-purpose models if they lack a retrieval-augmented generation (RAG) architecture tied to verified legal databases.

Why This Failure Mode Is Specific to Law

Most industries can absorb AI errors through internal review cycles. A marketing team catches a factual mistake in a draft before publishing. A logistics team flags a routing anomaly before the truck leaves. The cost of an error is the iteration time.

In law, the cost structure is different, on three levels.

1. Professional liability attaches immediately

Under Federal Rule of Civil Procedure Rule 11(b)(2), when a lawyer signs and submits a filing, they certify that all legal contentions are supported by existing law. That certification is personal. It does not transfer to the AI platform that generated the content. Courts have been explicit: even unintentional reliance on AI that produces false citations constitutes a sanctionable offense.

2. The harm propagates

A hallucinated citation that is not caught can appear in a court order and influence subsequent decisions. Judges and opposing counsel spend hours tracking references that do not exist. In Noland v. Land of the Free, L.P. (California Court of Appeals, 2025), the court noted that fake citations had required it to spend excessive time on what was otherwise a straightforward appeal.

3. Sanctions are escalating

In Johnson v. Dunn (N.D. Ala., July 2025), a federal court disqualified an entire Nashville law firm from the case, referred the attorneys to state bar associations in every jurisdiction where they held licenses, and required them to file a copy of the sanctions order in every pending matter where they served as counsel of record.

The American Bar Association’s Model Rule 1.1 requires lawyers to provide competent representation. Comment 8 explicitly extends this to understanding the benefits and risks associated with relevant technology. Using AI without understanding how it produces outputs is now a professional competence issue, not a technology management issue.

What Is Currently Safe to Use: Six Tool Categories

The following distinguishes tool categories by risk profile, not by brand ranking. Safe use in each category depends on a structured human review step.

1. Legal Research Platforms with Source-Linked AI

Westlaw Precision with CoCounsel and Lexis+ AI (Protégé) operate on closed legal corpora with citation verification built into the output.

Westlaw holds the federal judiciary contract (announced April 2025), covering over 25,000 federal legal professionals. Its KeyCite tool verifies citation validity. CoCounsel 2.0 integrates with Microsoft 365.
Lexis+ AI provides Shepard’s citation verification in real time, with simultaneous analysis and citation integrity checks. In 2025, Protégé assistant added voice interaction and a dual-mode system separating Legal AI from General AI tasks.

Condition for safe use: the lawyer reviews all cited sources before submission. The tool reduces search time. It does not replace verification responsibility.

2. Contract Review and Redlining Tools

AI-powered contract review operates on uploaded documents, not on legal reasoning generated from model memory. The model extracts, flags, and compares clauses within the document provided. Hallucination risk is substantially lower because the source material is explicit.

Harvey AI — valued at $8 billion as of 2025, 337 clients across 53 countries — focuses on document analysis, due diligence, and complex multi-document workflows.
LegalOn targets contract review with Microsoft Word integration, automating redlines against firm-defined playbooks.
DraftWise uses a firm’s historical deal data to generate context-aware contract suggestions.

Condition for safe use: contract review output is a first-pass flagging system. A lawyer reviews every flagged and unflagged clause for high-stakes transactions. AI does not sign off.

3. Document Review and eDiscovery

For litigation and M&A involving large document sets, AI-assisted review has a different risk profile. The task is prioritization and pattern detection, not legal argument generation. Errors in prioritization may cause relevant documents to be missed, but outputs are not submitted directly to courts.

Relativity aiR for Review — designed for high-volume document sets with full audit trails.
Everlaw AI Assistant — structured for litigation review with team-level annotation workflows.

Condition for safe use: AI review output is treated as a triage mechanism. A legal professional reviews the flagged set before production.

4. Practice Management AI

Clio Manage AI and similar platforms automate billing entries, calendar management, client communication, and task tracking. This category has significantly lower hallucination exposure because outputs do not appear in court filings.

The risk here is different: over-automation of client communication without attorney oversight, and data security if client-privileged information is processed through external servers without proper data governance. For DACH-region firms, DSGVO-konform data processing and revDSG compliance are additional hard requirements.

Condition for safe use: Confirm the vendor’s data processing terms explicitly prohibit training on client data. ISO 27001 or SOC 2 Type II certification is the minimum standard.

5. Drafting Assistance with Human-Initiated Structure

Tools like Spellbook (Microsoft Word integration) and Harvey’s drafting capabilities assist with first drafts when the lawyer defines the structure and inputs. Risk is materially lower when the lawyer writes the outline, provides the facts, and uses AI to accelerate prose — rather than asking an AI agent to generate legal argument from scratch.

Condition for safe use: the lawyer drafts the structure. AI fills within it. Every legal claim is verified before submission.

6. General-Purpose LLMs — The Highest-Risk Category

ChatGPT, Claude, Gemini, and similar AI platforms are not designed for legal citation accuracy. They are useful for summarizing documents the user provides, drafting communication, and brainstorming argument structures. They are not appropriate for generating case law references without RAG-based retrieval from a verified legal database.

The sanctions record confirms this. Mata v. Avianca, the Gauthier v. Goodyear case (Claude used without verification), and a Social Security brief with 12 fabricated out of 19 citations — these all involved general-purpose LLM output submitted without source verification.

Condition for use: treat output as a thinking tool, never as a source. Never submit AI-generated citations without retrieving and reading the actual document.

7. Custom AI Agents Built for Legal Operations

Off-the-shelf AI platforms were designed for general use cases. They were not built for a specific firm’s jurisdiction, workflows, data governance requirements, or practice management stack.

Lab51 builds custom AI agents for legal and compliance operations — workflows designed around specific use cases, with LLM integration and RAG architecture that reduce hallucination risk, and data handling that does not expose client-privileged information to third-party training sets. Concrete examples:

AI-assisted intake workflows that route matters based on practice area and urgency
Contract analysis systems integrated with existing document management tools
Knowledge base agents trained on internal precedents and firm-specific playbooks
Multi-platform deployment using Model Context Protocol (MCP) for consistent behaviour across channels

The outcome is not a generic chatbot deployment. It is a system designed for how the firm actually works, with the governance layer that professional liability requires.

The Risk Matrix: Task by Tool by Exposure

Task	Appropriate Tool	Risk If Misused
Case law research	Westlaw / CoCounsel, Lexis+ AI	Sanctions, fabricated citations in filings
Contract review (own documents)	Harvey, LegalOn, DraftWise	Missed risk clauses, deal exposure
Large-scale document review	Relativity, Everlaw	Missed evidence, discovery failure
Draft generation (from lawyer outline)	Spellbook, Harvey, Lexis+ AI	Unverified legal argument errors
Client communication drafts	Clio AI, general LLMs (with review)	Confidentiality risk, misstatements
Citation verification	Westlaw KeyCite, Shepard’s	Not a shortcut — must be used every time
Autonomous legal argument from scratch	None currently appropriate	Professional sanctions, malpractice

Why the Window for a Passive Approach Has Closed

Courts are now actively asking about AI use during sanctions proceedings. The pattern courts and bar associations have identified is consistent: wholesale abdication of verification responsibilities. Firms that treat AI deployment as a tool rollout rather than a governance decision are accumulating liability.

Three things changed in 2025 that make inaction increasingly expensive:

The ABA’s 2024 ethics guidance explicitly extends Model Rule 1.1 competence requirements to AI tool use. Understanding failure modes is now a professional obligation, not optional.
Enterprise AI contracts now include data processing addenda. Firms that skipped this step when adopting early tools may be processing client-privileged data in vendor training environments without knowing it.
Competitors who build structured AI governance frameworks today will demonstrate due diligence in the proceedings that eventually come. Those who do will not.

The question is not whether to use AI. It is whether to build the operating model that makes AI use defensible.

The Honest Summary

AI in law is not uniformly untrustworthy. It is unevenly trustworthy — depending on the task, the architecture, and the human oversight structure around it.

For document retrieval, contract review, eDiscovery triage, and drafting assistance with verified structure, AI tools are delivering measurable value today. For autonomous legal reasoning, citation generation from general-purpose LLMs, and any output submitted to a court without expert verification, the liability record is clear and growing.

Over 600 documented hallucination cases and 128 sanctioned lawyers in three years is not a technology teething problem. It is a signal that the profession adopted speed without building the discipline to manage what that speed introduces.

The firms that get this right will not be the ones that use the most AI. They will be the ones who know precisely where AI ends, and professional judgment begins.

Building AI workflows for a legal or compliance team? Lab51 designs custom AI agents with the data governance and verification architecture that professional liability requires. No generic tooling. No shortcuts on data security. Contact us for cooperation:

AI AI assistant AI in business strategy