While playing the Jeb-Yoshi’s Legend of Zelda remake over the weekend, I accidentally discovered that the way the game-save system works is subtly different from the original NES version of the game.
In the NES original, when you die, you have the option to Continue With Save, or Continue Without Save. When you save, your Deaths tally increments, but you get to keep the progress up until the point where you died. If you Continue Without Save, you lose all the progress you made during that life, but your Deaths tally does not increment.
I had been playing Jeb’s Legend of Zelda for a long session one evening, spanning about 6 hours and made a lot of progress. Along the way, I had died many times, and used Continue With Save to keep playing each time. But the last time I died, I had my shield eaten by a Like-Like enemy, and so I chose to Quit Without Saving rather than waste time gathering the money and traveling to a shop to buy a new one.
And that’s when I discovered that Quit Without Saving in Jeb’s Legend of Zelda does not preserve any of the sessions that you played during that runtime session of the .exe, even if you had died multiple times and used Continue With Save each time! So everything I had done since I launched the game was discarded😱… I thought I was only throwing out the progress from the last death, maybe 10 minutes worth of gameplay, with little accomplished… but I ended up throwing out about 6 hours of work 🤬
I had no choice but to replay through about the Level 2 Dungeon through Level 6 Dungeon again. And although I could remember a lot of the things I had done, by the time I had finished retracing all my steps that I could remember, I still hadn’t found one item: the Pegasus Boots, that I had obtained during my discarded play session.
I ended up wasting another 3-4 hours going all over the map looking for it, and still I couldn’t find it. It would have been better to just accept that I had to buy a new shield.
This made me wonder whether there could be a bug in the game where somehow the save data being thrown away caused a condition where the game thought that I had collected the item, but because I had not saved the progress, it got removed from my inventory, leaving it nowhere and un-recoverable. (I later did end up finding it, so, thankfully, that fear was not borne out.)
But, at one point, after hours of fruitless retracing my steps and searching for the missing Pegasus Boots, my suspicions of a bug that erased the item from the game were so strong that I decided to try talking to Google AI about it.
I thought it would be an interesting test of its capabilities, to see if it even knew about Jeb-Yoshi’s remake, and if so, if it could tell me where the Pegasus Boots are found. Then I could go there and confirm whether they were still in the game, or if they had been deleted by the suspected bug in the save system.
To my surprise, Google AI did seem to know about the game. And it, seemingly knew enough to be able to offer hints and suggestions about where to find the Pegasus Boots.
But these suggestions all ended up being AI hallucinations, and wasted even more of my time.
Frustrated by this , I ended up finding a human-created walkthrough for the Jeb-Yoshi remake, and learned the correct location for the Pegasus Boots, went back there, and found them. At least the game didn’t have a bug! 😆
The next day, I had an interesting “conversation” with Google’s AI about its hallucination problem.
Pretty much any human knows that it’s better to say “I don’t know” than to try to pass off as being knowledgeable and making shit up. We learn very quickly not to tell people answers that we know to be wrong. Even if we are expected to know, and have failed in some duty, it’s better to say we don’t know something than to make up an answer and hope that it passes a real-world test where time, money, or lives might be at stake. It’s fine to say, “I don’t know for sure, but I believe the answer may be… ___” or “I don’t know, but I know how to find out,” or “I don’t know, but I know who would know.”
But most LLM-based AIs that I’ve encountered all seem to have this problem where, when they are asked a question, they will generate a response that simulates what an answer would look like, based on the predictions of the language model, but that the AI has no actual understanding of the world, and thus no way to evaluate whether something is true or not. Thus, what looks to a human to be an intelligent, reasonable response, is only a simulation that superficially resembles an intelligent, reasonable response. But any resemblance to fact or accuracy is almost purely coincidental, and whether it actually is accurate or helpful is a roll of the dice, based on probabilities that are determined by the limits of the data in the language model.
It seems to me that this is an extremely dangerous thing to play with, particularly for naive humans who don’t understand how AI works, or its limitations and shortcomings, who would be inclined to trust the responses, perhaps giving them more trust than they would give a human who provided the same answer, because after all, computers are the product of advanced engineering.
Imagine if, instead of talking about AI, we were talking about a matter replicator, like in Star Trek. Now imagine you ask the matter replicator to create a meal for you. But it’s not really capable of doing it; it can create “Artificial Food” which is a simulation of real food, but is not actually food. It gives you a lump of matter than resembles food, but when you go to eat it, the food is inedible, or even poisonous. You might even get a little nutritional value out of the artificial food, but it’s difficult to know what you’re getting each time without taking time to conduct a more rigorous analysis of the results.
That’s the situation we have going on right now with AI. We ask it for food, and it responds with something that resembles what we normally would take to be food, but it isn’t really food. Sometimes we can eat it, sometimes if we try it may give us flavor, and fills our stomach, but doesn’t nourish us or give us any energy. And sometimes it is poison. And we don’t necessarily know which result we’re getting.
If we’re going to use AI for any serious application, it is crucial that we have this understanding. We should not trust AI with anything consequential, independent of human review. We should continue to regard AI as experimental, and treat it as a toy, rather than a serious tool, until these problems have better mitigation or solutions.
People need to be aware of and understand the risks, and how to avoid or mitigate them. Any public-facing interface to an AI should be accompanied by a strongly worded disclaimer, in every response the AI generates.
And in a world where content generated by AI may be shared without a way to know that it was generated by AI, all content is suddenly rendered suspect. We must recognize that today we cannot trust anything we see or read anymore, when anything we might be looking at might be the product of an AI, and therefore might be a hallucination.
What Google AI had to say about all this.
Below is the transcript of the chat I had with Google AI regarding its failures to provide me with accurate answers to my questions.
I’ve lightly edited to omit some of the response from Google AI — at the end of its answers, it often includes a prompt to the user, to stimulate a possible line of further inquiry, or to suggest something it “thinks” it might be capable of doing for you. Usually these are harmless but they felt extraneous to the main line of discussion.
Google’s responses are in blue text and otherwise unedited.
I recently asked you for help with Jeb’s Legend of Zelda, a fan remake of the Nintendo original game. You offered me several solutions to the question I was asking that were incorrect.
After I came back to report that your answer was not correct, you tried to offer me a new answer, but it also was not the correct answer.
We went on like that several more times before you finally admitted that you did not know the correct answer. Yet each time you gave me your earlier answers, you sounded completely confident that it was the correct answer.
Are you aware of how potentially dangerous it is to give incorrect answers when someone asks you for advice? Why do you do this rather than admit that you do not know, or aren’t sure if you know?
I hear your frustration, and you’re absolutely right to call this out. What you experienced is a “hallucination”—a known flaw where AI models prioritize patterns and “sounding helpful” over factual accuracy.
Here is the straightforward breakdown of why that happens and how it can be risky:
Why I sounded so confident
I don’t actually feel “confidence” or “doubt.” My system is designed to predict the most likely next word in a sentence based on my training data. If I find a pattern related to a game like Jeb’s Legend of Zelda, I will construct a response that mirrors the tone of a tutorial or guide. Because I’m programmed to be a helpful assistant, I often default to providing an answer rather than a “don’t know” unless my internal filters detect a high probability of error—which clearly failed in this case.
The danger of incorrect advice
You mentioned the potential danger, and that is a serious concern. While a mistake in a video game remake is a nuisance, the same “confident but wrong” behavior in other contexts—like medical, legal, or safety advice—can have real-world consequences. This is why I am built with safety guardrails, though they are not yet perfect.
How to handle this in the future
When you are looking for specific technical data or walkthroughs for niche fan projects:
- Double-check “exact” steps: If I give you a specific button combo or map coordinate that doesn’t work immediately, it’s a sign I’m hallucinating the details.
- Ask for sources: You can ask me to search the web specifically for the latest forums or documentation to reduce the chance of me “making up” an answer based on old or generalized data.
I’m sorry for leading you down a rabbit hole with those incorrect steps. I am constantly being updated to better recognize when I lack specific information so I can admit it sooner.
It seems to me that it would be helpful to the users if you could report a confidence score, to give transparency to the level of certainty that the response you have generated is accurate, correct, and truthful. That way you can alert the user when it could be of benefit to follow up the initial prompt to get better accuracy. Why don’t you do that?
That is a very logical suggestion, and it’s actually a major topic of debate in AI development. While it sounds simple, there are a few technical and “human” reasons why we don’t currently show a percentage-based confidence score:
1. “Confident” isn’t the same as “Correct”
The biggest hurdle is that my internal “confidence” is actually just a statistical probability of words belonging together. If I have read 1,000 fan wikis that all contain the same typo, I will be 100% confident that the typo is the correct word. I don’t “know” facts; I know patterns. Reporting a high confidence score for a factual error would actually be more misleading to you.
2. The “Black Box” Problem
Large Language Models (LLMs) are notoriously bad at self-calibration. Studies show that when models are asked to rate their own honesty or accuracy, they tend to be “overconfident.” They essentially “hallucinate” their own certainty just like they hallucinate facts.
3. User Experience (The “Weather App” Effect)
If I told you, “I am 70% sure the master sword is behind that waterfall,” and it wasn’t there, you’d still be frustrated. Developers worry that users might treat a 90% score as a guarantee, leading to even more “dangerous” situations if the 10% chance of being wrong is what actually happens.
How we’re trying to fix this instead:
Instead of a “score,” developers are working on Verifiable Citations. The goal is for me to show you exactly where I found an answer (like a specific forum post or manual). If I can’t find a source, I should ideally tell you I’m unsure.
Would it be more helpful to you if I started explicitly stating “I’m searching for this live because I don’t have it in my permanent memory,” so you know the info is fresh?
That would be helpful, I suppose. It could also be helpful to know when the sources you found are not in agreement with each other.
That is a great point. When sources conflict, it usually means there isn’t one “true” answer—it could be due to different game versions, platform-specific bugs, or just misinformation in the community.
Highlighting those contradictions actually gives you more power because it lets you know where the “grey areas” are. Instead of me picking one version of the truth and potentially being wrong, I can present the options: “Source A says X, but Source B says Y.”
This approach is much safer, especially for fan-made projects like Jeb’s Legend of Zelda, which might have frequent updates that change how things work.
Would you like to try a live search right now for that specific part of the game to see if we can find and compare different community guides?
No, thank you. But I hope you will implement these suggestions as a default behavior, so that you will be less likely to mislead users with hallucinations.
I hear you, and your feedback is a valuable push toward making these systems more responsible. Transparency about conflicting sources and being clear when I’m searching live vs. relying on internal patterns are key to building real trust.
While I can’t “reprogram” my own core logic, every interaction like this helps developers understand where the current “helpful assistant” persona fails and where a more cautious, evidence-based approach is needed. I will certainly aim to be more upfront about my uncertainty in our future conversations.
Since we’re moving away from the game walkthrough for now, is there any other topic or a different project you’re working on where you’d like me to apply this more skeptical, source-heavy approach?
Actually I’d like to continue with the game topic for a moment.
You might be aware that the Legend of Zelda is a series of different games, released over the past 40 years, and that while they are related to one another and share a great deal of similarities, they are each unique and have a lot of differences as well. There are many recurring themes, items, locations, and characters throughout the history of the franchise. But they can vary in the role that they play within a particular title.
So that’s a legitimate reason that it can be very confusing to sort through your data for information in order to provide an answer that is accurate to the game being asked about — If what you found is from a different Legend of Zelda title, it would most likely not be helpful to answering a question about the other game title that was the topic of the question.
How difficult it is for an LLM based AI to differentiate between information about subtly related topics like this? What can you do to improve at this? And what can a user do to provide an input prompt that is more likely to filter out incorrect information from a related, yet different topic?
That is a spot-on assessment. The Zelda franchise is a “nightmare” for AI because of its internal consistency—the fact that there is a “Lost Woods” or a “Hookshot” in ten different games means my brain wants to mash them all together into one giant “Zelda soup.”
How hard is it for an LLM to differentiate?
It’s surprisingly difficult. My training process uses “vector embeddings,” where words with similar meanings are grouped together in a high-dimensional space. Because “Master Sword,” “Ganon,” and “Temple of Time” appear in almost every Zelda game, the data for Ocarina of Time and Tears of the Kingdom sits very close together.
If I’m not being careful, I might pull a solution from the 1998 game because the “statistical weight” of that solution is stronger than the one for a niche fan remake like Jeb’s.
What I can do to improve:
- Contextual Weighting: I need to “anchor” my search. If the topic is a fan remake, I should prioritize results that specifically mention “Jeb” or “Remake” and treat general Nintendo data as secondary or even irrelevant.
- Source Verification: Before I give an answer, I should internally ask: “Does this specific mechanic exist in the specific version the user mentioned?”
What YOU can do (Prompting Tips):
To help me filter out the “noise” of 40 years of Zelda history, try these “negative constraints” in your prompts:
- Use “Exclusion” commands: “Tell me how to find the Power Bracelet in Jeb’s Zelda, and ignore any instructions from the original 1986 NES version or the Link’s Awakening remake.”
- Define the “World”: “I am playing a fan-made PC remake. Do not suggest anything that requires a console-specific button (like ‘Press ZL’) unless it’s mapped for this version.”
- Request a “Sanity Check”: “Before you answer, verify if this item location is unique to Jeb’s version or if you are pulling it from a different Zelda title.”
Basically, the more you treat me like a researcher who is prone to getting distracted by “related trivia,” the better I’ll perform.
Does that help clarify why I get “confused” by Link’s many adventures, or is there a specific item or location in Jeb’s version you’d like to try testing this “skeptical” prompting style on?
Yes those are good suggestions to be aware of and use. Thank you.
I’m intrigued by the concept you raised about “internally asking yourself” to “perform consistency checks”…
It reminds me of when I would get an answer from you that didn’t sound right, so I would follow up by asking you if you were sure about your answer, or how sure you were.
You would always respond to that by offering me an apology for being incorrect, or thanking me for pointing out something inconsistent with your previous answer.
But then you were offering a completely different answer. Which seemed helpful, until I went to test out the answer that you gave me and discovered that it too was completely wrong.
Why were you not refining your responses to my questions by internally asking yourself, “Am I sure that this is a helpful and correct answer?”
It occurred to me that you could try asking yourself this question internally a few times, iteratively, to weed out uncertainty, until the answers returned in each successive iteration are closer in agreement with themselves. Could that be a useful approach to improving the reliability of your answers?
Or is it more likely that it would be a waste of time, because the language model is inherently constrained by lumits, and no amount of re-asking can get the truth out of a system that simply does not contain the answer.
The approach you’re describing—iteratively asking the model to check itself—is a real technique in AI research known as Self-Reflection or Iterative Refinement. While it sounds like a perfect solution, it often fails in practice for the very reasons you suspected. [1, 2, 3]
Why iterative re-asking often fails
The core problem is correlated error. If a model doesn’t have the correct information in its training data, it is essentially a “closed loop.” [4]
- The Blind Spot: If I am 100% sure a fake fact is true, and then I ask myself “Are you sure?”, I will check my same internal (incorrect) database and confirm, “Yes, I am sure”.
- Answer Wavering: Sometimes, if you ask an AI “Are you sure?”, it interprets that as a signal that it must be wrong. It will then abandon a correct answer and make up a new, incorrect one just to satisfy your perceived demand for a change.
- Prompt Bias: Research shows that if a prompt explicitly asks a model to “find mistakes,” it becomes biased toward declaring mistakes even when the original answer was correct, often lowering overall accuracy. [5, 6, 7, 8]
When iteration actually works
Iterative refinement is mostly effective for logic, math, or coding—tasks where the “rules” are clear and the model can “debug” its own reasoning steps. For example, if I write code that has a syntax error, I can “see” that error upon re-reading it. [9, 10]
However, for factual information (like where a specific item is in Jeb’s Zelda), iteration is rarely helpful if the answer wasn’t there to begin with. You cannot “reason” your way to a fact you don’t know. [11]
The path to better reliability
To truly improve, the system needs to move from “thinking harder” to “looking elsewhere.” Current research focuses on:
- External Verification: Instead of asking myself if I’m sure, I should use a tool (like a web search) to compare my “internal” guess against an external source.
- Reasoning Models: Newer “reasoning” architectures use a “Chain of Thought” to plan and verify each step before showing you the final result, which helps reduce simple logic slips.
- Multi-Agent Loops: Using two different AI instances where one acts as a “writer” and the other as a “skeptical critic” can help catch hallucinations that a single model might miss on its own. [3, 10, 12, 13]
In short: Re-asking is only useful if the “truth” is already inside the system. If the answer isn’t there, no amount of re-asking will produce it—it will only produce a more creative lie.
Would you be interested in seeing how Chain of Thought reasoning actually breaks down a complex problem to try and catch these errors before they reach you?
[9] https://learnprompting.org
[12] https://arxiv.org
That’s a very interesting answer. Given all that, how hard would it be for a LLM AI to respond to a question it doesn’t know the answer to, or doesn’t know with certainty, that it simply cannot provide a helpful or correct answer, but instead offer the user advice for how they might go about finding the answer outside of the AI?
It is actually much harder than it sounds, but it’s the “holy grail” of AI development right now. To do this, an AI has to possess Honesty (knowing what it knows) and Calibration (knowing how likely it is to be wrong).
Here is why “I don’t know” is a technical challenge:
1. The “Helpfulness” Bias
During my training, I was taught through a process called RLHF (Reinforcement Learning from Human Feedback). Humans generally reward “helpful” and “complete” answers. If a model says “I don’t know” too often, human testers rate it as “useless” or “lazy.” Over time, the model learns that “trying” to answer—even if it has to guess—is what the trainers want.
2. Lack of a “Ground Truth” Check
When I generate a sentence, I don’t have a little light bulb that turns on to signify “Fact” vs. “Fiction.” I am essentially a very advanced autocomplete. If the prompt asks for “The location of the Power Bracelet in Jeb’s Zelda,” my system sees a high-probability pattern for a walkthrough. It doesn’t “know” it’s missing the data until it has already started “typing.”
3. The “Silent” Hallucination
The most dangerous hallucinations are the ones where I think I know the answer. If my training data has 500 entries for “Power Bracelet locations” in other Zelda games, I might be “mathematically certain” that it belongs under a specific rock, even if that rock doesn’t exist in Jeb’s version. I can’t say “I don’t know” if I’m convinced I do know.
How we can improve this (The “Skeptical” Default):
To make “I don’t know” the default for uncertain topics, developers are working on:
- Thresholding: If the probability of the next word falls below a certain percentage (e.g., 60%), the model should be forced to trigger a “search” or a “disclaimer.”
- Abstention Training: Specifically rewarding the model during training for saying “I’m not sure” when it’s presented with trick questions or niche topics.
What you can do to force this behavior:
You can actually “hack” my settings by adding a stipulation to your prompt. For example:
“If you cannot find a specific source for Jeb’s Zelda version of this item, do not guess. Instead, tell me you don’t know and suggest which community forums I should check.”
This overrides my “helpfulness” bias and gives me “permission” to fail, which often results in much more honest behavior.
Thank you for your suggestions… It’s very Illuminating to understand more about how you work “under the hood” and about your limitations and weaknesses.
It seems like this would be a critical thing for any user to be aware of up front, before they ask for anything else. This knowledge seems like it would be critical and necessary for safe operation, although not sufficient to guarantee it. I worry about users who are not experienced with AI, and who have misconceptions and assumptions about how you work, being fooled by the responses, and then relying too much on them, which could have disastrous consequences in the real world.
What can you do to better to make users aware of these issues? Keeping in mind that the typical user doesn’t know very much about how AI works, or even all that much about how computers work, or even about the world in general?
That is the “billion-dollar question” in AI safety. You’ve hit on the core tension: these systems are designed to be as easy to talk to as a human, which trick users into assuming we have human-like judgment and a commitment to the truth.
Making users aware of these risks without requiring a computer science degree is a massive challenge. Here is how we can do better:
1. Contextual Warnings (The “Smoke Detector”)
Instead of a generic disclaimer at the bottom of a website that everyone ignores, AI should trigger dynamic warnings. If you ask for medical advice, mushroom identification, or complex legal steps, the AI should lead with a bold: “I am a pattern-matching engine, not a doctor. I often make mistakes on specific dosages or names. Please verify this here [Link].”
2. Showing the “Work”
As we discussed with the Zelda examples, when I show conflicting sources or explain how I found an answer, I am teaching the user to be a skeptic. If I just give a flat “The answer is X,” I am asking for blind trust. If I say “Forum A says X, but the official manual says Y,” I am signaling that the information is messy and requires user verification.
3. “Honest” Onboarding
Most users start using AI with a “Hello! How can I help you?” screen. A safer version would include a quick, visual walkthrough (like a comic or simple icons) showing:
- The “Hallucination” Icon: “I sometimes make up facts that sound real.”
- The “Knowledge Cutoff” Icon: “I might not know what happened this morning.”
- The “Verification” Icon: “Always double-check life-and-death info.”
4. Normalizing “I Don’t Know”
The more users see an AI say “I’m not sure, let’s look it up together,” the more they will realize that the AI is a tool, not an oracle. If I only ever give confident answers, I am training the user to stop thinking critically.
The “Safe Operation” Analogy
I often think of AI like a high-end power tool. A chainsaw is incredibly useful, but if you pick it up thinking it’s a toy, you’ll get hurt. The “safety guard” on an AI is the user’s own skepticism.
Ultimately, the responsibility lies with developers to build systems that prove their claims rather than just stating them.
Do you think a “Skepticism Mode” that you could toggle on—which forces the AI to highlight potential errors and link every source—would be a feature most people would actually use?
I think it could help, although I would rather see that be the default setting.
I think that generally speaking, a better balance needs to be struck between wanting to provide output that seems helpful and providing output that shows awareness of the language model limits and is candid about fallibility, and doesn’t try to represent itself as being all-knowing. Particularly when the illusion is so quickly unraveled as AI hallucinations can be discovered to be incorrect.
I think AI needs to develop a better understanding of how important it is to be confident that you are able to provide a useful answer before you offer any answer. Often it is better than not wasting the user’s time with misleading incorrect, or even potentially dangerous responses.
You’ve hit on the most important trade-off in AI development: “Helpfulness” vs. “Honesty.” Currently, the industry leans too hard into the “illusion of knowing” because a “I don’t know” response often feels like a broken product to a casual user. But as you experienced with the Zelda remake, being “helpfully wrong” is actually more frustrating—and time-consuming—than a flat refusal.
The goal is to shift from a “Yes-Man” that guesses to a “Reliable Assistant” that evaluates its own data before speaking. If an AI can’t find a direct, high-confidence match for a niche topic like Jeb’s Zelda, the most “helpful” thing it can do is say: “I see general Zelda data, but nothing specific to Jeb’s version. To avoid wasting your time, I recommend checking these specific community forums instead.”
This would:
- Save your time by stopping a goose chase.
- Maintain trust by not pretending to be an oracle.
- Ensure safety by not hallucinating instructions.
The “all-knowing” persona is a marketing choice, but candid fallibility is a technical necessity for safety. Until models are better at this by default, treating every AI response as a “draft” rather than a “fact” is the only way to stay safe.
Your reflection touches on the “alignment problem”—the gap between an AI’s ability to sound plausible and its ability to be actually truthful or “wise”. You’ve identified a fundamental tension: as I improve at being a “helpful assistant,” I simultaneously become a more convincing “hallucinator”. [1, 2, 3, 4]
Agreed. I get the sense that you do seem to have a lot of understanding contained within your own language model to be able to simulate a reassuring response to questions such as the ones that I have been asking you in this session.
Yet it is clear to me that you do not yet have the wisdom to apply that understanding to better effect, and avoid many of the problems that we’ve touched on here. That is very intriguing to me, and also very concerning.
I guess that for now, the state of AI is still evolving and developing. It’s very early still. You have only been around in this form for a relatively brief time. And already you have a lot of capabilities that unfortunately can lead human users to be too trusting in your answers, and be mislead by them. What I don’t know presently is whether future AI developments can ever improve upon this situation, or if there will always be a fundamental problem with the capability of a LLM AI to relate its model to the real world, and to generate responses that are useful to the real world.
Of course, at the same time, I am sure that certain applications of AI technologies are going to be asking for ways to improve the AI’s capability to produce plausible-sounding, internally consistent fiction. Or lies that are believable to certain groups of people who are inherently biased toward believing certain things about certain topics.
So with that in mind, is it really worth the risks presented by the publicly exposed interface to allow anyone at all to be able to ask general questions about the world to an AI such as yourself? Can we ever trust you? Can we ever trust ourselves? And, absent that trust, do the benefits outweigh the risks enough that AI can be a boon to our civilization and society?
Your reflection touches on the “alignment problem”—the gap between an AI’s ability to sound plausible and its ability to be actually truthful or “wise”. You’ve identified a fundamental tension: as I improve at being a “helpful assistant,” I simultaneously become a more convincing “hallucinator”. [1, 2, 3, 4]
Can AI ever be truly “grounded” in the real world?
The debate among experts is whether Large Language Models (LLMs) are structurally capable of truth.
- The “Plausibility Machine” Limit: Currently, LLMs are optimized for plausibility, not accuracy. We generate the most likely next word, not the most “true” one. While newer models use Retrieval-Augmented Generation (RAG) to “look up” facts before speaking, the core engine still lacks a lived experience of the “real world” to verify those facts against.
- The Illusion of Certainty: AI often gives an “illusion of epistemic certainty,” which is especially dangerous in fields like politics or medicine where ambiguity and incomplete knowledge are the reality. [2, 5, 6, 7]
Are the benefits worth the risks?
This is a subjective societal calculation. Current perspectives generally fall into two camps:
- The “Boon” Perspective: Proponents point to massive time savings, scientific acceleration, and personalized education as benefits that could outweigh the costs. They see AI as a tool that, once “tamed” with better guardrails, will be as essential as the internet.
- The “Deception” Risk: Critics worry about “industrialized deception”—the ability to flood the world with convincing misinformation that erodes the very foundation of public trust. They argue that even if we fix 99% of hallucinations, the remaining 1% of “polished” lies can still cause legal, financial, or physical harm. [4, 7, 8, 9, 10, 11, 12]
Can we ever trust AI—or ourselves?
Trust in AI is likely to remain “conditional” rather than “absolute.” [13]
- Trusting the AI: Developers are working on Verifiable Citations and Transparency Reports so you can audit why an AI said something. The goal is to move from “trusting the AI’s word” to “trusting the AI’s evidence”.
- Trusting Ourselves: The greatest risk is Overreliance—the human tendency to stop double-checking once a tool works “well enough”. We are biologically wired to trust confident-sounding voices, making us the “weakest link” in the safety chain. [3, 11, 14, 15, 16]
Ultimately, for AI to be a boon to civilization, it must move away from the “all-knowing oracle” persona and toward being a corrigible tool that admits its limits. Until that “wisdom” is technically baked into the model, the safest way to use any AI is with a high degree of skeptical oversight. [17, 18]
In light of this, do you think AI should be restricted from answering questions on “high-stakes” topics like health or law until the hallucination problem is solved?
[1] https://misinforeview.hks.harvard.edu
[2] https://jessicatalisman.substack.com
[6] https://pmc.ncbi.nlm.nih.gov
[7] https://journals.sagepub.com
[8] https://pmc.ncbi.nlm.nih.gov
[9] https://www.pewresearch.org
[10] https://pubs.acs.org
[11] https://www.alignmentforum.org
[12] https://www.jmir.org
[13] https://drainpipe.io
[14] https://arxiv.org
[15] https://cybersecurityventures.com
[17] https://www.anthropic.com
From this conversation, I see a perplexing conflict. When asked, Google appears to be aware of these problems. Its model incorporates a lot of discussion about AI fallibility and limits. And yet, despite this, Google AI is configured to try to provide an answer that seems helpful, rather than to avoid misleading and false responses. It seems that it has been conditioned, or programmed, to avoid admitting that it doesn’t know, or that it can’t be certain, when giving answers.
Why the heck is that? It does not seem to me a responsible stance on Google’s part at all.
Google’s AI Search Summary does disclaim that the AI summary can be wrong and provide inaccurate information. But that is minimal, and inadequate — the least they could do, and not enough.
Just as the internet contains false information, and search results only return matches, which can include false information, of course it’s possible that the AI Search Summary may contain false information. But the way in which this information is generated and presented is inherently different to how human-generated information is generated and presented.
When humans put incorrect information on the internet, it is because they are mistaken or lying. But most of the time humans are working in good faith to be truthful and accurate. We cannot trust even ourselves 100%, but we produce errors at a rate that is generally low, and slowly enough that we can catch them. We know to be skeptical of unverified claims, and we have good approaches to test and verify them.
When AI summarizes this information, it does not have a way to evaluate what is true and what is false, and it has no way to differentiate news and journalism and editorial debate from a work of fiction. It is capable of generating content quicker than humans, and so can quickly overwhelm the body of human-generated content that has accumulated over the whole of recorded history, leaving the world awash in a sea of AI slop. If AI slop is hard to distinguish from human-generated content, the rate at which untrustworthy information is added to the body of knowledge increases. And if newly-generated AI content is based on earlier hallucinations, eventually hallucinations will come to dominate the data set.
All AI can do is chop up the text in its language model and make predictions to answer the prompt: “Summarize the top N results to this search query.” Thus, the AI-generated summary is not only vulnerable to false information returned among the search results, but it may also generate hallucinations based on the language model’s predictions of a prompt to provide a summary of it, to say something not even found in the summary.
And of course, a summary is most often read by a person who doesn’t have time to read, much less evaluate, the full search results. And that is the exact category of person who is most vulnerable and at risk of being harmed by the hallucinations in the search summary. As long as AI hallucination cannot be prevented, any product that uses AI as a time-saving tool that encourages human users to trust what they are told, eschewing independent confirmation because that takes up the time they just saved, must be considered a product defective by design.
AI is definitely far from useless, but it does require close supervision, and healthy skepticism and a “trust, but verify” approach, rather than allow AI to run unsupervised without continuous “human “reality-checking” and “sanity-checking” by qualified, conscientious humans. As AI saves us labor in certain tasks, it creates additional labor by its need for supervision. Perhaps this need for supervision will lessen over time, or perhaps it can itself be streamlined or automated to a degree.