ChatGPT Fact-Checking: WSU Study Tests AI Accuracy in Science

The AI Reliability Crisis: Why ChatGPT Still Gets the 'D' in Scientific Reasoning

Mar. 16, 2026 at 2:54pm

Got story updates? Submit your updates here. ›

A recent study from Washington State University reveals that even the latest iterations of ChatGPT struggle with basic scientific reasoning, consistently providing inaccurate and inconsistent answers when asked to verify research hypotheses. The study highlights the gap between linguistic fluency and conceptual intelligence in AI, raising concerns about the reliability of AI in critical decision-making.

Why it matters

The findings suggest that the arrival of Artificial General Intelligence (AGI) – AI that can truly 'reckon' and reason like a human – is further off than many predict. This has significant implications for how we trust and utilize AI in critical areas like scientific research and validation.

The details

The study, led by Professor Mesut Cicek, subjected ChatGPT to a rigorous test, feeding the AI over 700 hypotheses extracted from scientific papers and asking if the hypothesis was supported by research – true or false. Each hypothesis was queried ten times to assess consistency. Although accuracy improved from 76.5% in 2024 to 80% in 2025, accounting for chance, the AI's performance only rose to around 60% – a grade equivalent to a low 'D'. The biggest weakness was identifying false hypotheses, which ChatGPT correctly identified only 16.4% of the time in 2025.

The study was published in March 2026.
The study analyzed ChatGPT's performance in 2024 and 2025.

The players

Mesut Cicek

A professor at Washington State University who led the study on ChatGPT's scientific reasoning abilities.

ChatGPT

An AI language model developed by OpenAI that was the subject of the study on its accuracy in verifying scientific hypotheses.

Rutgers Business Review

The academic journal where the study's findings were published.

Got photos? Submit your photos here. ›

What they’re saying

“We're not just talking about accuracy, we're talking about inconsistency. If you ask the same question again and again, you arrive up with different answers.”

— Mesut Cicek, Professor

What’s next

The study suggests that the focus is shifting towards AI systems that augment human intelligence rather than replace it, with a focus on developing specialized AI models and Explainable AI (XAI) to increase transparency and accountability.

The takeaway

This research highlights the need to recalibrate expectations and understand the limitations of current AI systems, especially when it comes to critical tasks like scientific research and validation. While AI can be a valuable tool, it should be viewed as an aid, not a replacement, for human expertise and rigorous scientific methodology.

ChatGPT Fact-Checking: WSU Study Tests AI Accuracy in Science

Why it matters

The details

The players

Mesut Cicek

ChatGPT

Rutgers Business Review

What they’re saying

What’s next

The takeaway

Pullman top stories

Pullman events

Pullman Science

About us

Resources

Contact Us

Our Services

Months

Upcoming

All Months

Gifts

Blog

Shopping Reviews

Gift Guides

Popular Holidays

About National Today