AI Hallucination — IT definition
A generative AI model producing false or invented information with the same confidence as a correct answer.
An AI hallucination is the production by a GenAI model — typically an LLM — of false, invented, or unverifiable information, presented with the same confidence as a correct answer. The model does not "lie" in the human sense: it samples from a probability space without ground truth to check the output.
The problem is huge: a Stanford 2024 study on legal LLMs found 17 to 33 % of responses contained factual hallucinations. A NEJM AI 2024 review measured up to 28 % clinical errors in non-specialized public models. For a CIO, hallucinations are today the main obstacle to industrializing GenAI on critical use cases.
Why LLMs hallucinate
The causes are structural:
- •Statistical learning: the model predicts the most probable next token, not the most true one.
- •Noisy training data: web corpora contain errors, opinions, fiction.
- •Knowledge cutoff: models know nothing past their training date.
- •Out-of-distribution: on under-represented topics, the model plausibly invents.
- •Compression: a 70B-parameter model can't memorize all of the internet — it interpolates.
Types of hallucinations
- •Factual hallucinations: an invented fact (date, quote, name).
- •Reasoning hallucinations: an incorrect but plausibly coherent chain of logic.
- •Instruction hallucinations: failing to follow the exact instruction (format change, dropped constraint).
- •Source hallucinations: citing a paper, case, or book that doesn't exist.
- •Capability hallucinations: claiming to have done an action the model couldn't (common in AI agents).
Detecting and reducing hallucinations
Several stackable strategies:
- •[RAG](/en/glossary/rag): ground answers in verified internal sources. Drastically reduces factual hallucinations.
- •Fine-tuning: train the model on quality domain data.
- •Defensive prompt engineering: "answer only if confident", "cite your sources", "say I don't know if needed".
- •Self-consistency / chain-of-thought: generate multiple answers and check convergence.
- •LLM-as-judge: a second model verifies the first.
- •Mandatory citation: every assertion must reference a source.
- •Human-in-the-loop: human review on critical uses.
- •Specialized models: a legal-tuned model hallucinates less than generalist GPT-4 on law.
Measuring hallucination rate
Public benchmarks exist:
- •TruthfulQA: 817 trick questions on common misconceptions.
- •HaluEval: dedicated to summary, QA, and dialogue hallucinations.
- •HELM: (Stanford): holistic evaluation suite.
- •Vectara HHEM: Hughes Hallucination Evaluation Model — a model that detects hallucinations.
In-house, the common practice is to build an eval set specific to the business — questions with known answers — and measure error rates over time.
Hallucinations and responsibility
Recent case law confirmed that users remain responsible for generated content. In 2023, a US lawyer was sanctioned for citing court cases… fabricated by ChatGPT. GDPR also requires accuracy of personal data, which raises the question of hallucinations about real people.
The EU AI Act and ISO 42001 now require providers and deployers to put in place measurement, mitigation, and communication on hallucination risk.
Kabeen and verified context
Giving an LLM or AI agent accurate context about the IT estate (which application exists, who owns it, what is its real usage) eliminates most business hallucinations. Kabeen exposes that live context — through API or MCP — to the models and agents deployed across the company.
Frequently asked questions
What is an AI hallucination?
+
A hallucination is the production by a generative AI model — typically an LLM — of false, invented, or unverifiable information presented with the same confidence as a correct answer. The model does not lie: it samples from a probability space without ground truth. This is the main obstacle to industrializing GenAI on critical use cases.
Why do LLMs hallucinate?
+
The causes are structural: statistical learning (the model predicts the most probable next token, not the most true), noisy training data, knowledge cutoff, out-of-distribution topics, and inherent compression (a 70B-parameter model cannot memorize all of the internet, so it interpolates).
How do you reduce hallucinations?
+
Five stackable levers: (1) RAG to ground answers on verified sources, (2) fine-tuning on quality domain data, (3) defensive prompt engineering ("cite your sources", "say I don't know if needed"), (4) LLM-as-judge to verify a second model's output, (5) human-in-the-loop on critical uses. None fully eliminates the risk; combined, they reduce it drastically.
Who is liable for an AI hallucination?
+
Most recent case law puts liability on the user or deployer of the model, not the model vendor. A US lawyer was sanctioned in 2023 for citing court cases fabricated by ChatGPT. GDPR also requires accuracy of personal data, and the EU AI Act and ISO 42001 now require providers and deployers to deploy hallucination measurement and mitigation programs.
All terms
5R Method
A strategy used during application rationalization to determine the best approach for managing applications.
8R Method
An extended version of the 5R method used in application portfolio management and migration strategies.
Application
A computer program or set of programs designed to automate a business process or deliver value to end users.
Architecture
Refers to the structure and behavior of IT systems, processes, and infrastructure within an organization.
Need help mapping your IT landscape?
Kabeen helps you inventory, analyze and optimize your application portfolio.