- Today
- Holidays
- Birthdays
- Reminders
- Cities
- Atlanta
- Austin
- Baltimore
- Berwyn
- Beverly Hills
- Birmingham
- Boston
- Brooklyn
- Buffalo
- Charlotte
- Chicago
- Cincinnati
- Cleveland
- Columbus
- Dallas
- Denver
- Detroit
- Fort Worth
- Houston
- Indianapolis
- Knoxville
- Las Vegas
- Los Angeles
- Louisville
- Madison
- Memphis
- Miami
- Milwaukee
- Minneapolis
- Nashville
- New Orleans
- New York
- Omaha
- Orlando
- Philadelphia
- Phoenix
- Pittsburgh
- Portland
- Raleigh
- Richmond
- Rutherford
- Sacramento
- Salt Lake City
- San Antonio
- San Diego
- San Francisco
- San Jose
- Seattle
- Tampa
- Tucson
- Washington
ChatGPT Fact-Checking: WSU Study Tests AI Accuracy in Science
The AI Reliability Crisis: Why ChatGPT Still Gets the 'D' in Scientific Reasoning
Mar. 16, 2026 at 2:54pm
Got story updates? Submit your updates here. ›
A recent study from Washington State University reveals that even the latest iterations of ChatGPT struggle with basic scientific reasoning, consistently providing inaccurate and inconsistent answers when asked to verify research hypotheses. The study highlights the gap between linguistic fluency and conceptual intelligence in AI, raising concerns about the reliability of AI in critical decision-making.
Why it matters
The findings suggest that the arrival of Artificial General Intelligence (AGI) – AI that can truly 'reckon' and reason like a human – is further off than many predict. This has significant implications for how we trust and utilize AI in critical areas like scientific research and validation.
The details
The study, led by Professor Mesut Cicek, subjected ChatGPT to a rigorous test, feeding the AI over 700 hypotheses extracted from scientific papers and asking if the hypothesis was supported by research – true or false. Each hypothesis was queried ten times to assess consistency. Although accuracy improved from 76.5% in 2024 to 80% in 2025, accounting for chance, the AI's performance only rose to around 60% – a grade equivalent to a low 'D'. The biggest weakness was identifying false hypotheses, which ChatGPT correctly identified only 16.4% of the time in 2025.
- The study was published in March 2026.
- The study analyzed ChatGPT's performance in 2024 and 2025.
The players
Mesut Cicek
A professor at Washington State University who led the study on ChatGPT's scientific reasoning abilities.
ChatGPT
An AI language model developed by OpenAI that was the subject of the study on its accuracy in verifying scientific hypotheses.
Rutgers Business Review
The academic journal where the study's findings were published.
What they’re saying
“We're not just talking about accuracy, we're talking about inconsistency. If you ask the same question again and again, you arrive up with different answers.”
— Mesut Cicek, Professor
What’s next
The study suggests that the focus is shifting towards AI systems that augment human intelligence rather than replace it, with a focus on developing specialized AI models and Explainable AI (XAI) to increase transparency and accountability.
The takeaway
This research highlights the need to recalibrate expectations and understand the limitations of current AI systems, especially when it comes to critical tasks like scientific research and validation. While AI can be a valuable tool, it should be viewed as an aid, not a replacement, for human expertise and rigorous scientific methodology.

