By now, generative AI tools such as ChatGPT from OpenAI and Bard from Google have become go-to resources for marketers to use in everything from campaign ideation to content creation. But a new study reminds the marketing industry that it’s buyer beware when we use these tools. Marketers need to use these tools carefully.
Throughout 2023, generative AI has exploded in popularity. But with that uptake, researchers and practitioners have warned of the pitfalls of generative AI, such as generating biased results, infringing on copyright, and sharing inaccurate answers to queries. In fact, generative AI tools (also known as large language models) are capable of fabricating information, a phenomenon known as a hallucination.
Vectara, a start-up launched by ex-Google staff, is exploring the frequency with which chatbots deviate from factual accuracy. According to the firm’s studies, chatbots fabricate details in at least 3 percent of interactions, and this figure can reach up to 27 percent, despite measures taken to avoid such occurrences.
Vectara relied on its own Hallucination Evaluation Model as an open-source model on Hugging Face, along with a publicly accessible leaderboard. The leaderboard serves as a quality metric for large language model factual accuracy, similar to how credit ratings or FICO scores function for financial risk.
This gives businesses and developers insight into the performance of different large language models before implementing them.
Chatbots are so versatile that it is difficult to say definitively how often they hallucinate. Vectara researcher Simon Hughes, who led the Hallucination Evaluation Model project, told The New York Times that it would be necessary to examine all of the world’s information to make a determination. Hughes and his team tested the chatbots’ ability to summarize news articles, a straightforward task that is easy to verify. Even in this simple setting, the chatbots frequently invented information.
“We gave the system 10 to 20 facts and asked for a summary of those facts,” said Amr Awadallah, Vectara’s chief executive and a former Google executive. “It is a fundamental problem that the system can still introduce errors.”
The researchers argue that when these chatbots perform other tasks (beyond mere summarization) hallucination rates may be higher.
Now, here’s the good news for users of ChatGPT: OpenAI’s technologies had the lowest hallucination rate, around 3 percent.
It’s easy to get alarmed about a concept such as a large language model hallucinating, but at the same time, do you trust everything you find on the internet when you research a blog post, idea for a marketing campaign, or anything else for that matter? This is no reason to stop using generative AI. But you should be careful, as you should be when you’re online doing anything. For example:
At IDX, we’re on the front lines using technology to create content more effectively (and this statement goes well beyond generative AI). Click here to learn about our services. Click here to learn more about the opening of our new content production studio. We’re here to help you!