- Today
- Holidays
- Birthdays
- Reminders
- Cities
- Atlanta
- Austin
- Baltimore
- Berwyn
- Beverly Hills
- Birmingham
- Boston
- Brooklyn
- Buffalo
- Charlotte
- Chicago
- Cincinnati
- Cleveland
- Columbus
- Dallas
- Denver
- Detroit
- Fort Worth
- Houston
- Indianapolis
- Knoxville
- Las Vegas
- Los Angeles
- Louisville
- Madison
- Memphis
- Miami
- Milwaukee
- Minneapolis
- Nashville
- New Orleans
- New York
- Omaha
- Orlando
- Philadelphia
- Phoenix
- Pittsburgh
- Portland
- Raleigh
- Richmond
- Rutherford
- Sacramento
- Salt Lake City
- San Antonio
- San Diego
- San Francisco
- San Jose
- Seattle
- Tampa
- Tucson
- Washington
Research Finds ChatGPT Inconsistent, Inaccurate
Study shows AI language model struggles with nuanced reasoning, despite fluent language output
Mar. 17, 2026 at 12:15am
Got story updates? Submit your updates here. ›
Researchers at Washington State University repeatedly tested ChatGPT's ability to determine the accuracy of scientific hypotheses, finding the AI tool was only about 60% better than random chance in its responses, and highly inconsistent across multiple identical prompts. The findings highlight the limitations of current generative AI models in handling complex reasoning tasks, despite their linguistic fluency.
Why it matters
The study underscores the need for caution and skepticism when relying on AI tools like ChatGPT for critical tasks, as their reasoning capabilities often fall short of their language generation abilities. This has implications for businesses and consumers who may be tempted to over-rely on AI without proper verification.
The details
The researchers fed over 700 hypotheses from scientific papers into ChatGPT, asking it to determine if the statements were true or false. While ChatGPT answered correctly 76.5% of the time in 2024 and 80% in 2025, when accounting for random guessing, its accuracy was only about 60% better than chance. The AI struggled most to identify false hypotheses, getting those right just 16.4% of the time. Furthermore, ChatGPT was highly inconsistent, providing different true/false responses across 10 identical prompts.
- The initial experiment was conducted in 2024 using the free version of ChatGPT-3.5.
- The follow-up experiment was conducted in 2025 using the free, updated ChatGPT-5 mini.
The players
Mesut Cicek
An associate professor in the Department of Marketing and International Business at Washington State University's Carson College of Business, and the lead author of the new publication.
Sevincgul Ulu
A co-author from Southern Illinois University.
Can Uslay
A co-author from Rutgers University.
Kate Karniouchina
A co-author from Northeastern University.
ChatGPT
The free, commonly available generative AI tool that was the subject of the research study.
What they’re saying
“We're not just talking about accuracy, we're talking about inconsistency, because if you ask the same question again and again, you come up with different answers.”
— Mesut Cicek, Associate Professor
“Current AI tools don't understand the world the way we do - they don't have a 'brain'. They just memorize, and they can give you some insight, but they don't understand what they're talking about.”
— Mesut Cicek, Associate Professor
“Always be skeptical. I'm not against AI. I'm using it. But you need to be very careful.”
— Mesut Cicek, Associate Professor
What’s next
The researchers plan to continue testing the capabilities and limitations of ChatGPT and other generative AI tools, with the goal of helping businesses and consumers understand how to best utilize these technologies while maintaining appropriate skepticism and verification.
The takeaway
This study highlights the need for caution and critical thinking when relying on AI language models like ChatGPT, as their fluency in language does not necessarily translate to robust reasoning and conceptual understanding. Businesses and consumers should verify AI-generated outputs and not blindly trust them, especially for high-stakes or complex decisions.


