AI Struggles to Accurately Evaluate Scientific and Medical Claims

New study finds AI chatbots like ChatGPT only 60% accurate in assessing truth of statements

Mar. 24, 2026 at 7:00pm

Got story updates? Submit your updates here. ›

A new study from researchers at Washington State University found that AI chatbots like ChatGPT only perform about 60% better than random guessing when asked to evaluate the truth or falsity of scientific and medical claims. The researchers say this low accuracy score, equivalent to a 'D' grade, highlights the limitations of current AI systems in handling complex reasoning and nuanced information.

Why it matters

As more people turn to AI chatbots for information on health, science, and other technical topics, this study raises concerns about the reliability of the advice and analysis provided by these systems. The findings underscore the need for users to approach AI-generated content with skepticism and to verify information from authoritative sources.

The details

For the study, researchers fed over 700 claims into ChatGPT and asked it to judge whether each statement was true or false. While the AI system had an initial accuracy rate of around 80%, this dropped to just 60% after accounting for the odds of random guessing. The researchers noted significant inconsistencies, with ChatGPT sometimes labeling the same claim as both true and false in response to repeated prompts.

The study was published on March 16, 2026 in the Rutgers Business Review.

The players

Mesut Cicek

A professor of marketing and international business at Washington State University and the lead researcher on the study.

ChatGPT

A prominent AI chatbot developed by OpenAI that was the focus of the study's evaluation of AI's ability to assess the accuracy of scientific and medical claims.

Got photos? Submit your photos here. ›

What they’re saying

“We're not just talking about accuracy, we're talking about inconsistency, because if you ask the same question again and again, you come up with different answers.”

— Mesut Cicek, Lead Researcher

“Current AI tools don't understand the world the way we do — they don't have a 'brain'. They just memorize, and they can give you some insight, but they don't understand what they're talking about.”

— Mesut Cicek, Lead Researcher

What’s next

The researchers say their findings reinforce the need for users to approach information from AI chatbots with caution and to verify claims against authoritative sources, as the systems currently lack the conceptual understanding to reliably evaluate complex scientific and medical information.

The takeaway

This study highlights the limitations of current AI systems when it comes to handling nuanced, technical information. While AI chatbots may provide fluent and convincing responses, their lack of true understanding means users should not blindly trust their assessments of scientific and medical claims, and should instead verify information from reliable sources.

AI Struggles to Accurately Evaluate Scientific and Medical Claims

Why it matters

The details

The players

Mesut Cicek

ChatGPT

What they’re saying

What’s next

The takeaway

Pullman top stories

Pullman Science

About us

Resources

Contact Us

Our Services

Months

Upcoming

All Months

Gifts

Blog

Shopping Reviews

Gift Guides

Popular Holidays

About National Today