One reason LLMs hallucinate is because we incentivise them to give any answer over saying they don't know
A paper by Kalai et al “Why Language Models Hallucinate” gives one explanation as to why large language models such as ChatGPT, no matter how advanced, tend to “hallucinate” aka confidently tell you things are true that are not true.
A pretty basic reason at that. It seems that the how the models are trained and evaluated tends to be on a “did it answer this question correctly - yes or no?”. So when it tries to optimise itself to score as highly as possible, it does exactly what we tell school kids to do on exam questions that they don’t feel confident they know the answer of. It makes a best guess. Or simply any guess. When taking a multiple choice exam you’re always better to tick something at random rather than nothing, right?
The solution to this particular issue is of course to develop a way of penalising incorrect answers over “don’t know”, which the authors go into. Question answering isn’t really a binary matter. Some “not 100% correct answers” are better than others.
This isn’t the only reason that LLMs may promote falsehoods of course. Another handful the authors mention include:
- The data it’s trained on, e.g. “everything written on the internet”, itself contains some falsehoods (to say the least!)
- The question someone asks it might not have actually been covered by the training data, aka “out of distribution prompts”
- Some questions are just very, very hard to answer - see computational complexity theory.