Tag: Responsible AI

AI hallucination in the wild

While playing the Jeb-Yoshi’s Legend of Zelda remake over the weekend, I accidentally discovered that the way the game-save system works is subtly different from the original NES version of the game.

In the NES original, when you die, you have the option to Continue With Save, or Continue Without Save. When you save, your Deaths tally increments, but you get to keep the progress up until the point where you died. If you Continue Without Save, you lose all the progress you made during that life, but your Deaths tally does not increment.

I had been playing Jeb’s Legend of Zelda for a long session one evening, spanning about 6 hours and made a lot of progress. Along the way, I had died many times, and used Continue With Save to keep playing each time. But the last time I died, I had my shield eaten by a Like-Like enemy, and so I chose to Quit Without Saving rather than waste time gathering the money and traveling to a shop to buy a new one.

And that’s when I discovered that Quit Without Saving in Jeb’s Legend of Zelda does not preserve any of the sessions that you played during that runtime session of the .exe, even if you had died multiple times and used Continue With Save each time! So everything I had done since I launched the game was discarded😱… I thought I was only throwing out the progress from the last death, maybe 10 minutes worth of gameplay, with little accomplished… but I ended up throwing out about 6 hours of work 🤬

I had no choice but to replay through about the Level 2 Dungeon through Level 6 Dungeon again. And although I could remember a lot of the things I had done, by the time I had finished retracing all my steps that I could remember, I still hadn’t found one item: the Pegasus Boots, that I had obtained during my discarded play session.

I ended up wasting another 3-4 hours going all over the map looking for it, and still I couldn’t find it. It would have been better to just accept that I had to buy a new shield.

This made me wonder whether there could be a bug in the game where somehow the save data being thrown away caused a condition where the game thought that I had collected the item, but because I had not saved the progress, it got removed from my inventory, leaving it nowhere and un-recoverable. (I later did end up finding it, so, thankfully, that fear was not borne out.)

But, at one point, after hours of fruitless retracing my steps and searching for the missing Pegasus Boots, my suspicions of a bug that erased the item from the game were so strong that I decided to try talking to Google AI about it.

I thought it would be an interesting test of its capabilities, to see if it even knew about Jeb-Yoshi’s remake, and if so, if it could tell me where the Pegasus Boots are found. Then I could go there and confirm whether they were still in the game, or if they had been deleted by the suspected bug in the save system.

To my surprise, Google AI did seem to know about the game. And it, seemingly knew enough to be able to offer hints and suggestions about where to find the Pegasus Boots.

But these suggestions all ended up being AI hallucinations, and wasted even more of my time.

Frustrated by this , I ended up finding a human-created walkthrough for the Jeb-Yoshi remake, and learned the correct location for the Pegasus Boots, went back there, and found them. At least the game didn’t have a bug! 😆

The next day, I had an interesting “conversation” with Google’s AI about its hallucination problem.

Pretty much any human knows that it’s better to say “I don’t know” than to try to pass off as being knowledgeable and making shit up. We learn very quickly not to tell people answers that we know to be wrong. Even if we are expected to know, and have failed in some duty, it’s better to say we don’t know something than to make up an answer and hope that it passes a real-world test where time, money, or lives might be at stake. It’s fine to say, “I don’t know for sure, but I believe the answer may be… ___” or “I don’t know, but I know how to find out,” or “I don’t know, but I know who would know.”

But most LLM-based AIs that I’ve encountered all seem to have this problem where, when they are asked a question, they will generate a response that simulates what an answer would look like, based on the predictions of the language model, but that the AI has no actual understanding of the world, and thus no way to evaluate whether something is true or not. Thus, what looks to a human to be an intelligent, reasonable response, is only a simulation that superficially resembles an intelligent, reasonable response. But any resemblance to fact or accuracy is almost purely coincidental, and whether it actually is accurate or helpful is a roll of the dice, based on probabilities that are determined by the limits of the data in the language model.

It seems to me that this is an extremely dangerous thing to play with, particularly for naive humans who don’t understand how AI works, or its limitations and shortcomings, who would be inclined to trust the responses, perhaps giving them more trust than they would give a human who provided the same answer, because after all, computers are the product of advanced engineering.

Imagine if, instead of talking about AI, we were talking about a matter replicator, like in Star Trek. Now imagine you ask the matter replicator to create a meal for you. But it’s not really capable of doing it; it can create “Artificial Food” which is a simulation of real food, but is not actually food. It gives you a lump of matter than resembles food, but when you go to eat it, the food is inedible, or even poisonous. You might even get a little nutritional value out of the artificial food, but it’s difficult to know what you’re getting each time without taking time to conduct a more rigorous analysis of the results.

That’s the situation we have going on right now with AI. We ask it for food, and it responds with something that resembles what we normally would take to be food, but it isn’t really food. Sometimes we can eat it, sometimes if we try it may give us flavor, and fills our stomach, but doesn’t nourish us or give us any energy. And sometimes it is poison. And we don’t necessarily know which result we’re getting.

If we’re going to use AI for any serious application, it is crucial that we have this understanding. We should not trust AI with anything consequential, independent of human review. We should continue to regard AI as experimental, and treat it as a toy, rather than a serious tool, until these problems have better mitigation or solutions.

People need to be aware of and understand the risks, and how to avoid or mitigate them. Any public-facing interface to an AI should be accompanied by a strongly worded disclaimer, in every response the AI generates.

And in a world where content generated by AI may be shared without a way to know that it was generated by AI, all content is suddenly rendered suspect. We must recognize that today we cannot trust anything we see or read anymore, when anything we might be looking at might be the product of an AI, and therefore might be a hallucination.

What Google AI had to say about all this.

Below is the transcript of the chat I had with Google AI regarding its failures to provide me with accurate answers to my questions.

I’ve lightly edited to omit some of the response from Google AI — at the end of its answers, it often includes a prompt to the user, to stimulate a possible line of further inquiry, or to suggest something it “thinks” it might be capable of doing for you. Usually these are harmless but they felt extraneous to the main line of discussion.

Google’s responses are in blue text and otherwise unedited.

I recently asked you for help with Jeb’s Legend of Zelda, a fan remake of the Nintendo original game. You offered me several solutions to the question I was asking that were incorrect.

After I came back to report that your answer was not correct, you tried to offer me a new answer, but it also was not the correct answer.

We went on like that several more times before you finally admitted that you did not know the correct answer. Yet each time you gave me your earlier answers, you sounded completely confident that it was the correct answer.

Are you aware of how potentially dangerous it is to give incorrect answers when someone asks you for advice? Why do you do this rather than admit that you do not know, or aren’t sure if you know?

It seems to me that it would be helpful to the users if you could report a confidence score, to give transparency to the level of certainty that the response you have generated is accurate, correct, and truthful. That way you can alert the user when it could be of benefit to follow up the initial prompt to get better accuracy. Why don’t you do that?

No, thank you. But I hope you will implement these suggestions as a default behavior, so that you will be less likely to mislead users with hallucinations.

Actually I’d like to continue with the game topic for a moment.

You might be aware that the Legend of Zelda is a series of different games, released over the past 40 years, and that while they are related to one another and share a great deal of similarities, they are each unique and have a lot of differences as well. There are many recurring themes, items, locations, and characters throughout the history of the franchise. But they can vary in the role that they play within a particular title.

So that’s a legitimate reason that it can be very confusing to sort through your data for information in order to provide an answer that is accurate to the game being asked about — If what you found is from a different Legend of Zelda title, it would most likely not be helpful to answering a question about the other game title that was the topic of the question.

How difficult it is for an LLM based AI to differentiate between information about subtly related topics like this? What can you do to improve at this? And what can a user do to provide an input prompt that is more likely to filter out incorrect information from a related, yet different topic?

Yes those are good suggestions to be aware of and use. Thank you.

I’m intrigued by the concept you raised about “internally asking yourself” to “perform consistency checks”…

It reminds me of when I would get an answer from you that didn’t sound right, so I would follow up by asking you if you were sure about your answer, or how sure you were.

You would always respond to that by offering me an apology for being incorrect, or thanking me for pointing out something inconsistent with your previous answer.

But then you were offering a completely different answer. Which seemed helpful, until I went to test out the answer that you gave me and discovered that it too was completely wrong.

Why were you not refining your responses to my questions by internally asking yourself, “Am I sure that this is a helpful and correct answer?”

It occurred to me that you could try asking yourself this question internally a few times, iteratively, to weed out uncertainty, until the answers returned in each successive iteration are closer in agreement with themselves. Could that be a useful approach to improving the reliability of your answers?

Or is it more likely that it would be a waste of time, because the language model is inherently constrained by lumits, and no amount of re-asking can get the truth out of a system that simply does not contain the answer.

[1] https://arxiv.org

[2] https://www.iguazio.com

[3] https://galileo.ai

[4] https://www.techrxiv.org

[5] https://arxiv.org

[6] https://www.preprints.org

[7] https://arxiv.org

[8] https://medium.com

[9] https://learnprompting.org

[10] https://www.linkedin.com

[11] https://openreview.net

[12] https://arxiv.org

[13] https://bdtechtalks.com

That’s a very interesting answer. Given all that, how hard would it be for a LLM AI to respond to a question it doesn’t know the answer to, or doesn’t know with certainty, that it simply cannot provide a helpful or correct answer, but instead offer the user advice for how they might go about finding the answer outside of the AI?

Thank you for your suggestions… It’s very Illuminating to understand more about how you work “under the hood” and about your limitations and weaknesses.

It seems like this would be a critical thing for any user to be aware of up front, before they ask for anything else. This knowledge seems like it would be critical and necessary for safe operation, although not sufficient to guarantee it. I worry about users who are not experienced with AI, and who have misconceptions and assumptions about how you work, being fooled by the responses, and then relying too much on them, which could have disastrous consequences in the real world.

What can you do to better to make users aware of these issues? Keeping in mind that the typical user doesn’t know very much about how AI works, or even all that much about how computers work, or even about the world in general?

I think it could help, although I would rather see that be the default setting.

I think that generally speaking, a better balance needs to be struck between wanting to provide output that seems helpful and providing output that shows awareness of the language model limits and is candid about fallibility, and doesn’t try to represent itself as being all-knowing. Particularly when the illusion is so quickly unraveled as AI hallucinations can be discovered to be incorrect.

I think AI needs to develop a better understanding of how important it is to be confident that you are able to provide a useful answer before you offer any answer. Often it is better than not wasting the user’s time with misleading incorrect, or even potentially dangerous responses.

Agreed. I get the sense that you do seem to have a lot of understanding contained within your own language model to be able to simulate a reassuring response to questions such as the ones that I have been asking you in this session.

Yet it is clear to me that you do not yet have the wisdom to apply that understanding to better effect, and avoid many of the problems that we’ve touched on here. That is very intriguing to me, and also very concerning.

I guess that for now, the state of AI is still evolving and developing. It’s very early still. You have only been around in this form for a relatively brief time. And already you have a lot of capabilities that unfortunately can lead human users to be too trusting in your answers, and be mislead by them. What I don’t know presently is whether future AI developments can ever improve upon this situation, or if there will always be a fundamental problem with the capability of a LLM AI to relate its model to the real world, and to generate responses that are useful to the real world.

Of course, at the same time, I am sure that certain applications of AI technologies are going to be asking for ways to improve the AI’s capability to produce plausible-sounding, internally consistent fiction. Or lies that are believable to certain groups of people who are inherently biased toward believing certain things about certain topics.

So with that in mind, is it really worth the risks presented by the publicly exposed interface to allow anyone at all to be able to ask general questions about the world to an AI such as yourself? Can we ever trust you? Can we ever trust ourselves? And, absent that trust, do the benefits outweigh the risks enough that AI can be a boon to our civilization and society?

[1] https://misinforeview.hks.harvard.edu

[2] https://jessicatalisman.substack.com

[3] https://genai.owasp.org

[4] https://arxiv.org

[5] https://arxiv.org

[6] https://pmc.ncbi.nlm.nih.gov

[7] https://journals.sagepub.com

[8] https://pmc.ncbi.nlm.nih.gov

[9] https://www.pewresearch.org

[10] https://pubs.acs.org

[11] https://www.alignmentforum.org

[12] https://www.jmir.org

[13] https://drainpipe.io

[14] https://arxiv.org

[15] https://cybersecurityventures.com

[16] https://www.facebook.com

[17] https://www.anthropic.com

[18] https://www.youtube.com


From this conversation, I see a perplexing conflict. When asked, Google appears to be aware of these problems. Its model incorporates a lot of discussion about AI fallibility and limits. And yet, despite this, Google AI is configured to try to provide an answer that seems helpful, rather than to avoid misleading and false responses. It seems that it has been conditioned, or programmed, to avoid admitting that it doesn’t know, or that it can’t be certain, when giving answers.

Why the heck is that? It does not seem to me a responsible stance on Google’s part at all.

Google’s AI Search Summary does disclaim that the AI summary can be wrong and provide inaccurate information. But that is minimal, and inadequate — the least they could do, and not enough.

Just as the internet contains false information, and search results only return matches, which can include false information, of course it’s possible that the AI Search Summary may contain false information. But the way in which this information is generated and presented is inherently different to how human-generated information is generated and presented.

When humans put incorrect information on the internet, it is because they are mistaken or lying. But most of the time humans are working in good faith to be truthful and accurate. We cannot trust even ourselves 100%, but we produce errors at a rate that is generally low, and slowly enough that we can catch them. We know to be skeptical of unverified claims, and we have good approaches to test and verify them.

When AI summarizes this information, it does not have a way to evaluate what is true and what is false, and it has no way to differentiate news and journalism and editorial debate from a work of fiction. It is capable of generating content quicker than humans, and so can quickly overwhelm the body of human-generated content that has accumulated over the whole of recorded history, leaving the world awash in a sea of AI slop. If AI slop is hard to distinguish from human-generated content, the rate at which untrustworthy information is added to the body of knowledge increases. And if newly-generated AI content is based on earlier hallucinations, eventually hallucinations will come to dominate the data set.

All AI can do is chop up the text in its language model and make predictions to answer the prompt: “Summarize the top N results to this search query.” Thus, the AI-generated summary is not only vulnerable to false information returned among the search results, but it may also generate hallucinations based on the language model’s predictions of a prompt to provide a summary of it, to say something not even found in the summary.

And of course, a summary is most often read by a person who doesn’t have time to read, much less evaluate, the full search results. And that is the exact category of person who is most vulnerable and at risk of being harmed by the hallucinations in the search summary. As long as AI hallucination cannot be prevented, any product that uses AI as a time-saving tool that encourages human users to trust what they are told, eschewing independent confirmation because that takes up the time they just saved, must be considered a product defective by design.

AI is definitely far from useless, but it does require close supervision, and healthy skepticism and a “trust, but verify” approach, rather than allow AI to run unsupervised without continuous “human “reality-checking” and “sanity-checking” by qualified, conscientious humans. As AI saves us labor in certain tasks, it creates additional labor by its need for supervision. Perhaps this need for supervision will lessen over time, or perhaps it can itself be streamlined or automated to a degree.