Researchers Unlock Hidden Personalities and Biases in Large Language Models

MIT and UC San Diego team develop method to identify and manipulate internal representations of abstract concepts in AI systems

Published on Feb. 25, 2026

Researchers have developed a new technique to peer inside the 'black box' of large language models (LLMs) like ChatGPT and Gemini, allowing them to identify and manipulate the models' internal representations of abstract concepts such as moods, biases, and distinct personalities. This 'concept engineering' approach, using a predictive modeling algorithm called a recursive feature machine (RFM), enables developers to fine-tune LLMs for specific tasks while maintaining safety and mitigating harmful biases.

Why it matters

Understanding the hidden depths of LLMs is crucial for building trust and ensuring responsible AI development. This research provides tools to identify and potentially control the expression of complex concepts within these powerful language models, which could lead to more specialized and safer AI systems in the future.

The details

The researchers successfully tested their method on over 500 concepts across five categories - fears, experts, moods, location preferences, and personas. They were able to pinpoint representations for ideas like 'conspiracy theorist' and 'Boston fandom,' and then amplify or diminish those concepts' influence on the models' responses. Previous attempts to uncover hidden concepts often relied on broad, unsupervised searches, but the RFM algorithm allows for a more targeted approach.

  • The research was published on February 19, 2026.

The players

Adityanarayanan 'Adit' Radhakrishnan

An assistant professor of mathematics at MIT who emphasizes that LLMs already contain these complex concepts, but they aren't always readily accessible.

Got photos? Submit your photos here. ›

What they’re saying

“With our method, there's ways to extract these different concepts and activate them in ways that prompting cannot give you answers to.”

— Adityanarayanan 'Adit' Radhakrishnan, Assistant Professor of Mathematics, MIT

What’s next

The researchers acknowledge the potential for misuse of this technology and emphasize the need for caution and responsible development as we gain more control over LLM behavior. Future trends may include the emergence of 'concept engineering' as a specialized field within AI development, allowing for the creation of highly specialized and safer AI systems.

The takeaway

This research provides a powerful new tool for understanding and manipulating the internal representations of complex concepts within large language models. While the potential for misuse exists, the ability to fine-tune LLMs and mitigate harmful biases could lead to a future of more specialized and trustworthy AI systems.