- Today
- Holidays
- Birthdays
- Reminders
- Cities
- Atlanta
- Austin
- Baltimore
- Berwyn
- Beverly Hills
- Birmingham
- Boston
- Brooklyn
- Buffalo
- Charlotte
- Chicago
- Cincinnati
- Cleveland
- Columbus
- Dallas
- Denver
- Detroit
- Fort Worth
- Houston
- Indianapolis
- Knoxville
- Las Vegas
- Los Angeles
- Louisville
- Madison
- Memphis
- Miami
- Milwaukee
- Minneapolis
- Nashville
- New Orleans
- New York
- Omaha
- Orlando
- Philadelphia
- Phoenix
- Pittsburgh
- Portland
- Raleigh
- Richmond
- Rutherford
- Sacramento
- Salt Lake City
- San Antonio
- San Diego
- San Francisco
- San Jose
- Seattle
- Tampa
- Tucson
- Washington
Can AI Predict Flawed Science Studies?
A DARPA-funded project aims to build a 'credit score' for research, but the challenge remains daunting.
Apr. 1, 2026 at 3:35pm
Got story updates? Submit your updates here. ›
Researchers have built AI systems to try to predict whether scientific studies will hold up to scrutiny, but the project, called SCORE, has found that artificial intelligence is still far from reliable at this task. The SCORE team has, however, uncovered important insights about the complex nature of the scientific process and ways it could be improved, such as requiring researchers to share data and code.
Why it matters
Replicating research is crucial for validating scientific findings, but it is often neglected due to time and funding constraints. The inability to reliably predict which studies are likely to replicate highlights the need for reforms to improve transparency and accountability in science.
The details
The SCORE project, funded by DARPA, analyzed 3,900 papers across the social sciences and found that only about half of 164 replicated studies yielded the same results as the originals. The team also found that when multiple teams analyzed the same data using different methods, they obtained precisely the same result only one-third of the time. Problematic data and coding errors were also identified as common causes of replication failures.
- The SCORE project started in 2019 and grew to include 865 researchers.
- The team analyzed 3,900 papers published from 2009 to 2018.
The players
Adam Russell
A former program manager at DARPA who envisioned generating a 'credit score' for science to assess the robustness of research findings.
Brian Nosek
The executive director of the Center for Open Science and a leader of the SCORE project, who cautions that even with careful research, scientists will sometimes turn out to be wrong.
Clair Patterson
A geochemist at Caltech who in 1953 used a new technique to determine that Earth is 4.5 billion years old, a finding that withstood intense scrutiny from critics.
Melanie Mitchell
An artificial intelligence researcher at the Santa Fe Institute who recently replicated an AI paper and failed to match the original results, but had her paper rejected on the grounds that it lacked novelty.
Jay Bhattacharya
The director of the National Institutes of Health, who is working on ways to improve replication, including sharing new tools for data and code sharing and developing a journal dedicated to publishing replication efforts.
What they’re saying
“People can say, 'Hey, this is likely to be robust, we can premise a policy on it.' But this? Nah, this might make for a book in the airport.”
— Adam Russell, Former DARPA program manager
“We're not there yet. It's picking up some kind of signal, but it would have to get a lot more accurate to use on its own.”
— Brian Nosek, Executive director, Center for Open Science
“I really hate this kind of culture.”
— Melanie Mitchell, Artificial intelligence researcher, Santa Fe Institute
“Science determines what's true based on replication. I don't feel it's working well right now.”
— Jay Bhattacharya, Director, National Institutes of Health
“I don't think anybody's cracked the code on that.”
— Jeremy Berg, Biochemist, University of Pittsburgh School of Medicine
What’s next
The National Institutes of Health plans to share new tools for data and code sharing, identify key ideas in different fields, and award grants to replicate them. The agency is also developing a journal dedicated to publishing replication efforts.
The takeaway
The SCORE project has highlighted the complex and often flawed nature of the scientific process, underscoring the need for greater transparency, accountability, and investment in replication efforts to improve the reliability of research findings. While AI may not be able to reliably predict which studies will hold up, the project has provided valuable insights that could help guide reforms to strengthen the scientific enterprise.


