Hallucination in AI đź”—
When AI works, it's like magic. Often, the magical success of performing seemingly impossible tasks for a machine is cited as anecdotal evidence that AI will change everything. I agree with the portrayal of AI as magic and believe that generative AI will transform our future much faster and more profoundly than the already traditional machine learning AI. However, everyone has experienced it not working all the time, giving irritating and blatantly wrong responses, known as hallucinations. A mostly harmless but still annoying example is Joanna Stern's latest challenge with "Apple Intelligence" referring to her imaginary husband:
[...] Summaries of messages from my wife often include the word “husband”—usually when she’s talking about one of our sons. With a tap, you see the original notifications, so the confusion abates quickly, but it can still be shocking to see something so wrong. [...]
(At least) Two Problems With Hallucination đź”—
While the mindset of technologists is often Hey, it mostly works and is already so useful (to my eyes)—so let's release it! An 80% solution is enough for now, and we can fix the rest later, I believe that the glorious promise of the future with widens another gap between technology and society, as tech often overlooks its impact on society and individuals.
Specifically, I see two major problems:
Currently: it really frustrates regular users, consumers, and anyone affected by a "hallucination." The focus here is on how ordinary people experience AI, not developers who are amazed by AI's power for prototyping, learning aids, or MVP generators. If regular people aren't frustrated by AI, it might even be dangerous. This is a significant issue, and I wonder if unconditional proponents of AI, captivated by its magic, are falling into the trap of some version of the anecdotal evidence fallacy. I admit it's very tempting to accept what AI generates at face value. You might perform some spot checks, but eventually, you just accept the results, implicitly accepting the risk of having overlooked an apparent flaw.
For the future: who guarantees that we can minimize these AI-specific annoyances and deal with inevitable errors? What is an acceptable ratio? What use cases are fine with an 80% (or 90%, 99%...) solution, and where do we need more?
Of course, there are attempts to control hallucinations: examples are self-verification, where a very simplified simple technique would be to ask for a confidence level in the prompt, ranging from 0 to 1, which should indicate how sure the model is about its response. Other methods include external retrieval via RAG or multiple AI cross-checks via chain-of-verification. But what does "hallucination control" mean and how far can we get?
A Visual Analogy for "Hallucination Control" đź”—
I started to compare reasoning with LLM to a simple geodetic network as a straightforward visual framework for the hallucination problem1, This might seem a bit complex, but bear with me:
Points: In this analogy, each point represents a statement or piece of information produced by the LLM. Its uncertainty (the "error ellipse" or circle) reflects the confidence the model (or the observer) has in that statement.
Lines: Connecting points are akin to interactions—user inputs and corresponding LLM responses. The length of the line may indicate how close (or far away) you are to the previous statement in terms of information content. A sequence of line segments corresponds to a dialog where the risk of hallucination may increase.
Hallucination Risks: Propagated errors in the geodetic analogy relate to the compounding errors in the LLM’s reasoning process or "hallucinations" where it generates unreliable or incorrect information. The risks are like "confidence circles" around a point.
Just as in geodesy, where fixing particular points as a gauge reduces error, grounding an LLM with reliable reference points (accurate context or external data) may prevent hallucinations. Of course, geodetic error propagation follows clear mathematical laws, whereas AI reasoning errors can be chaotic, contextual, and don't necessarily "propagate" in a measurable way. The analogy oversimplifies the complex nature of AI reasoning errors, yet it provides a valuable visual understanding of the dynamics of errors in interacting with LLMs.
If you're interested in experimenting with geodetic networks, my former PhD advisor, Wolfgang Förstner, has an amazing collection of very insightful Cinderella animations for those curious about geometry and statistics. Two animations related to geodetic networks and gauge choice are:
- Gauge Choice and Loop Closing of Uncertain Polygon: This is where the screenshot above was taken from: Click on "start" on the left to fix the first two points and then move the other points around or add new one to see how the error distributes across the points.
- Gauge Choice for a Geodetic Network: A more complex setup with a network instead of just one polygon line. In the analogy, multiple interdependent sources can constraint or even amplify the error.
Click here for an interactive diagram (click "Start" button and move points around)
What to consider đź”—
The parallel between AI hallucination control and geodetic networks points to three key considerations:
- Acceptable Error Thresholds: Different applications require different accuracy levels - from casual chatbots to critical systems.
- User Awareness: Understanding AI limitations and treating hallucinations as manageable characteristics rather than system failures.
- Technical Solutions: Developing better control mechanisms through reliable external data sources and verification systems.
Just as surveyors developed robust methods to control error propagation, the AI field needs similar systematic frameworks. The challenge ahead lies not in eliminating hallucinations entirely, but in developing reliable methods to detect, contain, and manage them effectively. But the ultimate challenge is the "social implementation" of the technology: how will we as a society manage these errors?
Any thoughts? Feel free to get in touch with me in real life, social media or by just sending a message.
I cannot deny my background in photogrammetry and geomatics. ↩