AI Protein Language Models Get Accuracy Test to Improve Reliability

Emory researchers develop a new method to quantify how well these models understand biological data

Apr. 2, 2026 at 4:50am

Computational biologists at Emory University have developed a simple way to test the accuracy of AI language models used to analyze complex biological data like proteins and DNA. Their new framework compares how these models 'embed' or numerically codify synthetic random proteins versus proteins found in nature, providing a reliability score to identify low-quality predictions and improve the models.

Why it matters

AI language models are revolutionizing biology by treating biological data like a language, but a critical gap has been the lack of a method to estimate the reliability of their predictions. This new framework gives researchers a way to quantify how well these models understand proteins and other biological information, allowing them to refine the models and develop better tools for unraveling the complexities of genomes and metagenomes.

The details

The key to the new testing method lies in understanding how evolution shapes proteins. Protein language models are trained on actual proteins found in nature, which contain an evolutionary signature. The researchers compared how a language model would classify this biologically meaningful information versus randomly generated synthetic proteins. They found the model grouped natural proteins separately in a 'junkyard' area of the latent space, indicating lower confidence in those embeddings. This led them to develop a 'random neighbor score' that quantifies the overlap between a protein's nearest neighbors and non-biological sequences, providing an inverse measure of the model's confidence.

  • The new method was published in Nature Methods in April 2026.

The players

Yana Bromberg

Senior author of the paper and Emory professor of biology and computer science, a pioneer in applying machine learning for protein and genomic analysis.

R. Prabakaran

First author of the study and a postdoctoral fellow in the Bromberg lab.

Emory University

The institution where the computational biologists who developed the new framework are based.

Got photos? Submit your photos here. ›

What they’re saying

“To the best of our knowledge, our framework is the first generalized method to quantify protein sequence embedding reliability.”

— Yana Bromberg, Senior author, Emory professor

“Our method is a simple, elegant solution to a complex problem. It's a foundational method with a lot of scope for a range of language models in science.”

— R. Prabakaran, First author, postdoctoral fellow

“We are shining a light into the black box of AI. Better understanding how a language model works allows you to find ways to keep improving its reliability and to develop better models.”

— Yana Bromberg, Senior author, Emory professor

What’s next

The researchers plan to apply their new method to further refine and improve AI language models used in biological research, helping to enhance the reliability and quality control of these powerful tools.

The takeaway

This new framework provides a critical missing piece for evaluating the accuracy of AI language models in biology, allowing researchers to identify weaknesses and continually sharpen these tools to better unravel the complexities of genomes, proteins, and microbial communities.