AI models keep hallucinating
Current generations of Large Language Models (LLMs) are not very reliable. It’s been known from the start that they can easily make up all kinds of factually wrong information (hallucinations). Here’s from “Familiarity is the enemy” (April 24th 2026):
In June 2024, Stanford’s Regulation, Evaluation, and Governance Lab benchmarked Lexis+ AI and Thomson Reuters’ Westlaw AI-Assisted Research – two of the most expensive commercial legal-RAG systems in the world. Hallucination rates were 17% for Lexis+ AI and over 34% for Westlaw, on authoritative corpora, with citations attached.
The citations – it turned out – pointed to the chunk that was retrieved, not to the evidence for the claim. The language model, under context load, would ignore the retrieved chunk and fabricate from its parametric memory instead. Sometimes the citation was real and the summary was invented. Sometimes the citation was invented and the summary was internally coherent. Sometimes the whole thing was right. But there was no reliable way to tell which was which.
Of course, state of the art models have improved in leaps and bounds since then. They’ll continue to improve, no doubt, but hallucination is a feature, not a bug. Systems that work in high stakes environments – like our Legal system – need to be built to account for that, rather than pretending it’s not there. But it’s a difficult problem to solve.
While LLMs can be useful, it’s very important to double check all the facts yourself. This is of course apart from the biases and censorship that gets baked into the LLMs by the big corporations during their training phase.
Comments
There are 0 responses. Follow any responses to this post through its comments RSS feed. You can leave a response, or trackback from your own site.