- Today
- Holidays
- Birthdays
- Reminders
- Cities
- Atlanta
- Austin
- Baltimore
- Berwyn
- Beverly Hills
- Birmingham
- Boston
- Brooklyn
- Buffalo
- Charlotte
- Chicago
- Cincinnati
- Cleveland
- Columbus
- Dallas
- Denver
- Detroit
- Fort Worth
- Houston
- Indianapolis
- Knoxville
- Las Vegas
- Los Angeles
- Louisville
- Madison
- Memphis
- Miami
- Milwaukee
- Minneapolis
- Nashville
- New Orleans
- New York
- Omaha
- Orlando
- Philadelphia
- Phoenix
- Pittsburgh
- Portland
- Raleigh
- Richmond
- Rutherford
- Sacramento
- Salt Lake City
- San Antonio
- San Diego
- San Francisco
- San Jose
- Seattle
- Tampa
- Tucson
- Washington
Researchers Unlock Hidden Personalities and Biases in Large Language Models
MIT and UC San Diego team develop method to identify and manipulate internal representations of abstract concepts in AI systems
Published on Feb. 25, 2026
Got story updates? Submit your updates here. ›
Researchers have developed a new technique to peer inside the 'black box' of large language models (LLMs) like ChatGPT and Gemini, allowing them to identify and manipulate the models' internal representations of abstract concepts such as moods, biases, and distinct personalities. This 'concept engineering' approach, using a predictive modeling algorithm called a recursive feature machine (RFM), enables developers to fine-tune LLMs for specific tasks while maintaining safety and mitigating harmful biases.
Why it matters
Understanding the hidden depths of LLMs is crucial for building trust and ensuring responsible AI development. This research provides tools to identify and potentially control the expression of complex concepts within these powerful language models, which could lead to more specialized and safer AI systems in the future.
The details
The researchers successfully tested their method on over 500 concepts across five categories - fears, experts, moods, location preferences, and personas. They were able to pinpoint representations for ideas like 'conspiracy theorist' and 'Boston fandom,' and then amplify or diminish those concepts' influence on the models' responses. Previous attempts to uncover hidden concepts often relied on broad, unsupervised searches, but the RFM algorithm allows for a more targeted approach.
- The research was published on February 19, 2026.
The players
Adityanarayanan 'Adit' Radhakrishnan
An assistant professor of mathematics at MIT who emphasizes that LLMs already contain these complex concepts, but they aren't always readily accessible.
What they’re saying
“With our method, there's ways to extract these different concepts and activate them in ways that prompting cannot give you answers to.”
— Adityanarayanan 'Adit' Radhakrishnan, Assistant Professor of Mathematics, MIT
What’s next
The researchers acknowledge the potential for misuse of this technology and emphasize the need for caution and responsible development as we gain more control over LLM behavior. Future trends may include the emergence of 'concept engineering' as a specialized field within AI development, allowing for the creation of highly specialized and safer AI systems.
The takeaway
This research provides a powerful new tool for understanding and manipulating the internal representations of complex concepts within large language models. While the potential for misuse exists, the ability to fine-tune LLMs and mitigate harmful biases could lead to a future of more specialized and trustworthy AI systems.
San Diego top stories
San Diego events
Feb. 28, 2026
Beetlejuice (Touring)Feb. 28, 2026
Artemas - LOVERCORE TourFeb. 28, 2026
Brahms Festival: Symphonies 1 & 2




