Research Finds ChatGPT Inconsistent, Inaccurate

Study shows AI language model struggles with nuanced reasoning, despite fluent language output

Mar. 17, 2026 at 12:15am

Researchers at Washington State University repeatedly tested ChatGPT's ability to determine the accuracy of scientific hypotheses, finding the AI tool was only about 60% better than random chance in its responses, and highly inconsistent across multiple identical prompts. The findings highlight the limitations of current generative AI models in handling complex reasoning tasks, despite their linguistic fluency.

Why it matters

The study underscores the need for caution and skepticism when relying on AI tools like ChatGPT for critical tasks, as their reasoning capabilities often fall short of their language generation abilities. This has implications for businesses and consumers who may be tempted to over-rely on AI without proper verification.

The details

The researchers fed over 700 hypotheses from scientific papers into ChatGPT, asking it to determine if the statements were true or false. While ChatGPT answered correctly 76.5% of the time in 2024 and 80% in 2025, when accounting for random guessing, its accuracy was only about 60% better than chance. The AI struggled most to identify false hypotheses, getting those right just 16.4% of the time. Furthermore, ChatGPT was highly inconsistent, providing different true/false responses across 10 identical prompts.

  • The initial experiment was conducted in 2024 using the free version of ChatGPT-3.5.
  • The follow-up experiment was conducted in 2025 using the free, updated ChatGPT-5 mini.

The players

Mesut Cicek

An associate professor in the Department of Marketing and International Business at Washington State University's Carson College of Business, and the lead author of the new publication.

Sevincgul Ulu

A co-author from Southern Illinois University.

Can Uslay

A co-author from Rutgers University.

Kate Karniouchina

A co-author from Northeastern University.

ChatGPT

The free, commonly available generative AI tool that was the subject of the research study.

Got photos? Submit your photos here. ›

What they’re saying

“We're not just talking about accuracy, we're talking about inconsistency, because if you ask the same question again and again, you come up with different answers.”

— Mesut Cicek, Associate Professor

“Current AI tools don't understand the world the way we do - they don't have a 'brain'. They just memorize, and they can give you some insight, but they don't understand what they're talking about.”

— Mesut Cicek, Associate Professor

“Always be skeptical. I'm not against AI. I'm using it. But you need to be very careful.”

— Mesut Cicek, Associate Professor

What’s next

The researchers plan to continue testing the capabilities and limitations of ChatGPT and other generative AI tools, with the goal of helping businesses and consumers understand how to best utilize these technologies while maintaining appropriate skepticism and verification.

The takeaway

This study highlights the need for caution and critical thinking when relying on AI language models like ChatGPT, as their fluency in language does not necessarily translate to robust reasoning and conceptual understanding. Businesses and consumers should verify AI-generated outputs and not blindly trust them, especially for high-stakes or complex decisions.