Research ·

Catching AI Hallucinations Before They Reach Users

A verification framework using Natural Language Inference to validate AI-generated citations, improving accuracy from 73% to 97%.

97.3%
Citation accuracy after verification
Up from 73% baseline without verification

The problem with AI citations

Large language models are remarkably good at sounding authoritative. They'll cite specific sections, quote exact figures, and reference particular documents—all with complete confidence.

The problem is that sometimes those citations don't exist. The section numbers are fabricated. The figures are wrong. The documents are real, but they don't say what the AI claims they say.

When we analyzed responses from leading AI systems on regulatory compliance questions, we found error rates ranging from 18% to 47% depending on the model and domain. For an industry where compliance failures carry penalties in the millions of dollars, that's untenable.

Five ways AI citations fail

Phantom citations

References to documents, sections, or paragraphs that simply don't exist. The AI invents a plausible-sounding citation.

Attribution errors

The information is correct, but it's attributed to the wrong source. Section 4.3 actually says something else entirely.

Quotation fabrication

The AI puts text in quotes and attributes it to a document, but the exact quote doesn't appear anywhere in the source.

Semantic drift

Paraphrasing that subtly (or not so subtly) changes the meaning. "May consider" becomes "must implement."

Currency errors

Citing requirements that have been superseded without acknowledging they're no longer in effect.

How we verify citations

Our approach uses Natural Language Inference (NLI) models—systems trained specifically to determine whether one piece of text supports, contradicts, or is neutral toward another.

1
Extract Claims
Break response into atomic, verifiable claims
2
Retrieve Sources
Get the actual cited passages from documents
3
NLI Verification
Check if source actually supports each claim
4
Flag or Pass
Mark verified claims; surface problems

The key insight is that NLI models are different from the generative models that produce hallucinations. They're trained on a simpler task—does text A support text B?—and they're good at it. We use an ensemble of three NLI classifiers to catch cases where any single model might miss.

What we found

We evaluated the framework across 4,200 AI-generated compliance responses covering banking, privacy, anti-money laundering, and information security regulations.

Model Without Verification With Verification
GPT-4 68.2% 96.8%
Claude 3 73.8% 97.3%
Gemini Pro 65.1% 96.4%
Llama 70B 52.6% 94.7%

The framework catches 94% of fabricated citations before they reach users, while correctly passing through legitimate citations 97% of the time.

What still gets through

The remaining 3% of errors fall into identifiable categories:

38%

Semantic ambiguity

Regulatory text that's genuinely ambiguous—both the claim and its opposite could be reasonable interpretations.

27%

Cross-reference gaps

Claims that require synthesizing multiple sections we didn't retrieve together.

19%

Temporal misalignment

Claims that were accurate when the source was indexed but have since been superseded.

16%

Edge cases

Unusual formatting or phrasing that trips up the extraction or matching steps.

Practical considerations

Verification isn't free. It adds 400-800ms latency to each response, and requires access to the source documents being cited. But for applications where accuracy matters—compliance, legal, medical, financial—that tradeoff is usually worth it.

We've found three integration patterns work well:

Inline verification

Only show content that passes verification. Strict but safe—users never see unverified claims.

Visible indicators

Show all content, but mark each citation with its verification status. Let users see everything while knowing what's been confirmed.

Human-in-the-loop

Route low-confidence responses to human reviewers. Good for high-stakes decisions where automation isn't enough.

Questions we get asked

Can't RAG solve this?

RAG (retrieval-augmented generation) helps by grounding the AI in real documents, but it doesn't guarantee the AI accurately represents what it retrieved. We've seen RAG systems confidently misquote the documents they just pulled up. Verification is a separate step that catches these errors.

What about multilingual documents?

Current implementation is English-only. Multilingual NLI is an active research area, and we're watching for models that achieve cross-lingual transfer without significant accuracy degradation.

Does this work for any domain?

We've validated on regulatory compliance. The approach should generalize to any domain with citable sources—legal, medical, academic, technical—but we haven't yet run the numbers.

The takeaway

AI systems will confidently cite sources that don't exist and quote passages that were never written. This isn't a bug that will be fixed in the next model version—it's a fundamental property of how these systems work.

For applications where citation accuracy matters, verification isn't optional. The good news is that it works. NLI-based verification catches the vast majority of fabricated citations while adding acceptable latency. For enterprise deployment in regulated industries, we think it should be standard.

Validate AI outputs with multiple models

Onyx Legion's multi-AI deliberation catches errors single models miss.

Try Legion Free →