- Today
- Holidays
- Birthdays
- Reminders
- Cities
- Atlanta
- Austin
- Baltimore
- Berwyn
- Beverly Hills
- Birmingham
- Boston
- Brooklyn
- Buffalo
- Charlotte
- Chicago
- Cincinnati
- Cleveland
- Columbus
- Dallas
- Denver
- Detroit
- Fort Worth
- Houston
- Indianapolis
- Knoxville
- Las Vegas
- Los Angeles
- Louisville
- Madison
- Memphis
- Miami
- Milwaukee
- Minneapolis
- Nashville
- New Orleans
- New York
- Omaha
- Orlando
- Philadelphia
- Phoenix
- Pittsburgh
- Portland
- Raleigh
- Richmond
- Rutherford
- Sacramento
- Salt Lake City
- San Antonio
- San Diego
- San Francisco
- San Jose
- Seattle
- Tampa
- Tucson
- Washington
AI Protein Language Models Get Accuracy Test to Improve Reliability
Emory researchers develop a new method to quantify how well these models understand biological data
Apr. 2, 2026 at 4:50am
Got story updates? Submit your updates here. ›
Computational biologists at Emory University have developed a simple way to test the accuracy of AI language models used to analyze complex biological data like proteins and DNA. Their new framework compares how these models 'embed' or numerically codify synthetic random proteins versus proteins found in nature, providing a reliability score to identify low-quality predictions and improve the models.
Why it matters
AI language models are revolutionizing biology by treating biological data like a language, but a critical gap has been the lack of a method to estimate the reliability of their predictions. This new framework gives researchers a way to quantify how well these models understand proteins and other biological information, allowing them to refine the models and develop better tools for unraveling the complexities of genomes and metagenomes.
The details
The key to the new testing method lies in understanding how evolution shapes proteins. Protein language models are trained on actual proteins found in nature, which contain an evolutionary signature. The researchers compared how a language model would classify this biologically meaningful information versus randomly generated synthetic proteins. They found the model grouped natural proteins separately in a 'junkyard' area of the latent space, indicating lower confidence in those embeddings. This led them to develop a 'random neighbor score' that quantifies the overlap between a protein's nearest neighbors and non-biological sequences, providing an inverse measure of the model's confidence.
- The new method was published in Nature Methods in April 2026.
The players
Yana Bromberg
Senior author of the paper and Emory professor of biology and computer science, a pioneer in applying machine learning for protein and genomic analysis.
R. Prabakaran
First author of the study and a postdoctoral fellow in the Bromberg lab.
Emory University
The institution where the computational biologists who developed the new framework are based.
What they’re saying
“To the best of our knowledge, our framework is the first generalized method to quantify protein sequence embedding reliability.”
— Yana Bromberg, Senior author, Emory professor
“Our method is a simple, elegant solution to a complex problem. It's a foundational method with a lot of scope for a range of language models in science.”
— R. Prabakaran, First author, postdoctoral fellow
“We are shining a light into the black box of AI. Better understanding how a language model works allows you to find ways to keep improving its reliability and to develop better models.”
— Yana Bromberg, Senior author, Emory professor
What’s next
The researchers plan to apply their new method to further refine and improve AI language models used in biological research, helping to enhance the reliability and quality control of these powerful tools.
The takeaway
This new framework provides a critical missing piece for evaluating the accuracy of AI language models in biology, allowing researchers to identify weaknesses and continually sharpen these tools to better unravel the complexities of genomes, proteins, and microbial communities.
Atlanta top stories
Atlanta events
Apr. 2, 2026
Jordan Kleeper (21 and Over)Apr. 2, 2026
SlomosaApr. 2, 2026
Atlanta Vibe vs. Omaha Supernovas




