Catching AI Hallucinations Before They Reach Users
A verification framework using Natural Language Inference to validate AI-generated citations, improving accuracy from 73% to 97%.
The problem with AI citations
Large language models are remarkably good at sounding authoritative. They'll cite specific sections, quote exact figures, and reference particular documents—all with complete confidence.
The problem is that sometimes those citations don't exist. The section numbers are fabricated. The figures are wrong. The documents are real, but they don't say what the AI claims they say.
When we analyzed responses from leading AI systems on regulatory compliance questions, we found error rates ranging from 18% to 47% depending on the model and domain. For an industry where compliance failures carry penalties in the millions of dollars, that's untenable.
Five ways AI citations fail
Phantom citations
References to documents, sections, or paragraphs that simply don't exist. The AI invents a plausible-sounding citation.
Attribution errors
The information is correct, but it's attributed to the wrong source. Section 4.3 actually says something else entirely.
Quotation fabrication
The AI puts text in quotes and attributes it to a document, but the exact quote doesn't appear anywhere in the source.
Semantic drift
Paraphrasing that subtly (or not so subtly) changes the meaning. "May consider" becomes "must implement."
Currency errors
Citing requirements that have been superseded without acknowledging they're no longer in effect.
How we verify citations
Our approach uses Natural Language Inference (NLI) models—systems trained specifically to determine whether one piece of text supports, contradicts, or is neutral toward another.
The key insight is that NLI models are different from the generative models that produce hallucinations. They're trained on a simpler task—does text A support text B?—and they're good at it. We use an ensemble of three NLI classifiers to catch cases where any single model might miss.
What we found
We evaluated the framework across 4,200 AI-generated compliance responses covering banking, privacy, anti-money laundering, and information security regulations.
| Model | Without Verification | With Verification |
|---|---|---|
| GPT-4 | 68.2% | 96.8% |
| Claude 3 | 73.8% | 97.3% |
| Gemini Pro | 65.1% | 96.4% |
| Llama 70B | 52.6% | 94.7% |
The framework catches 94% of fabricated citations before they reach users, while correctly passing through legitimate citations 97% of the time.
What still gets through
The remaining 3% of errors fall into identifiable categories:
Semantic ambiguity
Regulatory text that's genuinely ambiguous—both the claim and its opposite could be reasonable interpretations.
Cross-reference gaps
Claims that require synthesizing multiple sections we didn't retrieve together.
Temporal misalignment
Claims that were accurate when the source was indexed but have since been superseded.
Edge cases
Unusual formatting or phrasing that trips up the extraction or matching steps.
Practical considerations
Verification isn't free. It adds 400-800ms latency to each response, and requires access to the source documents being cited. But for applications where accuracy matters—compliance, legal, medical, financial—that tradeoff is usually worth it.
We've found three integration patterns work well:
Inline verification
Only show content that passes verification. Strict but safe—users never see unverified claims.
Visible indicators
Show all content, but mark each citation with its verification status. Let users see everything while knowing what's been confirmed.
Human-in-the-loop
Route low-confidence responses to human reviewers. Good for high-stakes decisions where automation isn't enough.
Questions we get asked
Can't RAG solve this?
RAG (retrieval-augmented generation) helps by grounding the AI in real documents, but it doesn't guarantee the AI accurately represents what it retrieved. We've seen RAG systems confidently misquote the documents they just pulled up. Verification is a separate step that catches these errors.
What about multilingual documents?
Current implementation is English-only. Multilingual NLI is an active research area, and we're watching for models that achieve cross-lingual transfer without significant accuracy degradation.
Does this work for any domain?
We've validated on regulatory compliance. The approach should generalize to any domain with citable sources—legal, medical, academic, technical—but we haven't yet run the numbers.
The takeaway
AI systems will confidently cite sources that don't exist and quote passages that were never written. This isn't a bug that will be fixed in the next model version—it's a fundamental property of how these systems work.
For applications where citation accuracy matters, verification isn't optional. The good news is that it works. NLI-based verification catches the vast majority of fabricated citations while adding acceptable latency. For enterprise deployment in regulated industries, we think it should be standard.
Validate AI outputs with multiple models
Onyx Legion's multi-AI deliberation catches errors single models miss.