The challenges of identifying AI hallucinations
Generative AI is everywhere right now. People are talking about it, trying it out, and sometimes getting it to do their homework (or actual work) for them. But generative AI chatbots have a weakness that’s starting to get more attention, and it’s called hallucination.
When you ask a generative AI chatbot like ChatGPT, Google Bard, or Microsoft Bing a question, you get a helpful answer in reply. That answer might be right, partially right, or completely wrong. This can happen if the AI was trained on incorrect data, but AIs have also been known to fabricate answers that seem reasonable based on the data they’ve seen. These confidently incorrect responses are termed hallucinations.
The challenge with hallucinations is that they can be hard to spot, especially on topics that require background knowledge to understand the answer. AI chatbots can come up with something that is close enough to be convincing but isn’t correct. For a highly-specific example, a Reddit user asked ChatGPT a question about ballet, specifically asking it to describe a jump called a sissonne. The reply:
“A sissonne is a jump that starts from two feet and lands on one foot. The working leg brushes through the floor and extends into the air, while the other leg pushes off the ground and lifts into the air to meet the working leg in mid-air. The landing leg then bends to absorb the landing and bring the body down onto the floor.”
If you’re not familiar with ballet, this sounds plausible. The motions described are definitely a jump, and in fact, ChatGPT did describe a real ballet jump. However, it’s most likely a cabriole—the description is vague enough to be open to interpretation.
I asked ChatGPT again about the sissonne to see if it had learned the difference since the previous request. This time, it described a pas de ciseaux (scissor step) and said the name sissonne comes from the French word for scissors—a common misconception due to the similar sounds. It’s actually named for the Comte de Sissonne, who created the jump.
Why did ChatGPT get it wrong? The second answer may be due to the sources on which the AI was trained. For example, Wikipedia’s glossary of ballet page discusses the scissor misconception and pas de ciseaux under the entry for sissonne, creating a logical link. However, a third request produced yet a different description and attributed the jump to another dancer who lived about 100 years later. A Google search turned up nothing relevant, so this may be a case of ChatGPT getting creative.
Google Bard gave a better answer with a mix of correct, vague, and incorrect information. Microsoft’s Bing chatbot gave me a short, vague text answer but linked sources and a tutorial video.
You probably weren’t expecting a lesson on ballet terminology in a post about AI hallucination, but the point is that dance isn’t the only category where AIs will provide an answer that sounds good but is actually wrong. A lawyer recently found himself in hot water over a brief that cited at least six fictional legal cases after he used ChatGPT to help with his research.1 When the lawyer questioned the AI about the validity of the cases, it cited reputable legal databases as the sources, even though the cases do not exist within those sources.
Generative AIs are very convincing liars because they have no idea they’re presenting false information. They also can’t tell us why they came up with an answer, which makes solving this issue harder. Researchers are working on ways to help improve AI accuracy, such as having a second AI engine check the answer, limiting the sources AI is trained on, or using another type of AI to verify results for output like code. OpenAI, the maker of ChatGPT, is working on process supervision to provide feedback at each step in the AI’s chain of thought, not just for the outcome.2 OpenAI’s research paper is specific to solving math problems, but researchers believe process supervision may help produce more correct answers on other subjects in the future.
For now, AIs like ChatGPT, Google Bard, or Microsoft Bing chatbot are fun to play with and are genuinely helping some people be more productive, but take every answer with a heavy dose of skepticism. It’s a good reminder to seek out reputable sources in order to fight misinformation.
- CNN, Lawyer apologizes for fake court citations from ChatGPT, May 2023
- OpenAI, Improving mathematical reasoning with process supervision, March 2023