Home
News
You are here

Which popular AI chatbots hallucinate the most?

Some AI chatbots hallucinate more than others, while some hallucinate less.

0comments

By Alan Friedman

Published: Apr 24, 2026, 8:25 PM

Add as a preferred source on Google

Apps

Which AI chatbot hallucinates the most? | Image by PhoneArena

AI chatbots are not perfect as they will sometimes hallucinate. To put this in layman's terms, AI Large Language Models (LLMs) are trained to use patterns to predict the next likely word in a series.

Why AI chatbots hallucinate

If the model cannot find a specific pattern to answer a question, it will still try to answer it by putting together words that make sense from a statistical basis even if it is incorrect factually. This is why AI models still need a human to verify data such as stock prices, names, and dates. The AI model isn't at fault when it hallucinates as it is still following what it was programmed to do even if the data isn't there for a response to be made.

25% of American workers are now using AI regularly, which means it is important that you know which chatbots are prone to hallucinating, and which models are often not there when you require them to be. This info was gathered by Legal Guardian Digital, an SEO company for law firms.

The chatbot that hallucinates the most is revealed in this survey

The study looked at which popular AI chatbots can be trusted the most by showing how regularly each model gave false information, which ones had higher customer satisfaction scores, and which ones stayed available for users without crashing. Each of these variables were used to calculate an index score (0-100).

Do you rely on responses from AI chatbots?

The chatbot in the ratings with the highest hallucination rate is Google Gemini. The report says that Gemini is "tripping" on 32% of its replies, which might make some inside Apple a little nervous. The tech giant is reportedly paying Google at least $1 billion annually to use a custom 1.2 trillion parameter Gemini LLM to power the Siri chatbot expected to arrive with iOS 27 later this year.

ChatGPT was the next chatbot likely to hallucinate with this happening in three out of every 10 responses. The chatbot less likely to hallucinate is Perplexity AI with mistaken answers reaching users only 13% of the time. China's DeepSeek and Elon Musk-owned Grok were the next two models less likely to come up with bogus responses with hallucination percentages of 14% and 15% respectively.

Chart shows ranking of AI chatbots by categories.

Ranking the top AI chatbots. | Image by Legal Guardian Digital

To put this in even more simple terms, giving an inaccurate response 30% of the time, ChatGPT is twice as apt to respond to your query with a reply that is incorrect than Deep Seek. Remember, DeepSeek was trained for a fraction of the cost that was spent to train ChatGPT.

Two chatbots did not go down at all during the survey

DeepSeek and ChatGPT both had the highest customer satisfaction score of 4.7 (out of 5) followed by the 4.6 score generated by Perplexity AI. The lowest score of 3.4 belonged to Meta AI, while three models were next with 4.4 scores.

When it comes down to the consistency and quality of responses from these LLMs, on a scale of 0-5, Kimi AI scored the highest in this category at 4.3. Tied with a score of 4 were ChatGPT, Copilot, and Gemini, while Meta AI was last with a score of 3.4.

Only two of the chatbots were up and running at all times and those were Perplexity AI and Grok. This is important because no matter how accurate the AI model is that you're using, if it is down, that accuracy doesn't matter. ChatGPT (up 99.98%) and Gemini (up 99.95%) were among the chatbots that were practically up all of the time.

The top ranked chatbot and the one with the lowest score

These models very rarely went down as the lowest Uptime rate, still a very reliable 99.68%, belonged to Claude.

The Index Score, which runs from 0-100, is based on all the aforementioned variables. The top ranked chatbot, with a score of 85, was Perplexity AI (iOS, Android). Grok was next (iOS, Android) with a score of 79 followed by DeepSeek (iOS, Android).

In case you are curious, with an index score of 50, ChatGPT (iOS, Android) finished in sixth place and Gemini (iOS, Android) was eighth with a score of 41. Meta AI (iOS, Android) was the lowest scoring chatbot with a score of 37.

$20 /mo

$25

$5 off (20%)

Offer Ends 6.1.2026 at 11.59pm ET. New members get $5/mo off the $25/mg Visible plan, $35/mo Visible+ plan, or $45/mo Visible+ Pro plan for the first 12 months. Promo code FRESHSTART required at checkout.

Buy at Visible

View Full Bio

Alan, an ardent smartphone enthusiast and a veteran writer at PhoneArena since 2009, has witnessed and chronicled the transformative years of mobile technology. Owning iconic phones from the original iPhone to the iPhone 15 Pro Max, he has seen smartphones evolve into a global phenomenon. Beyond smartphones, Alan has covered the emergence of tablets, smartwatches, and smart speakers.

Read the latest from Alan Friedman