MIT Scientists Uncover AI Memorization Risk to Patient Privacy

Researchers develop tests to evaluate potential data leakage from clinical AI models trained on electronic health records.

Apr. 13, 2026 at 6:12am

A bold, abstract painting in soft blues, greens, and grays, featuring sweeping geometric shapes, concentric circles, and precise spirals, conceptually representing the complex interplay of data, algorithms, and patient privacy in the healthcare AI ecosystem.As clinical AI models become more powerful, the risk of patient data memorization and unauthorized extraction poses a growing threat to medical privacy and ethics.Boston Today

A new study by MIT researchers has revealed a concerning risk to patient privacy in the era of clinical AI. The researchers found that artificial intelligence models trained on de-identified electronic health records (EHRs) can potentially memorize and extract sensitive patient-specific information, even when the data appears to be anonymized. The team developed a series of tests to measure this risk and emphasize the need for rigorous evaluation before deploying such models in healthcare settings.

Why it matters

As AI becomes more prevalent in clinical decision-making, ensuring patient privacy and data security is paramount. This research highlights how even de-identified data can be vulnerable to exploitation, potentially undermining the trust between patients and the medical system. Addressing these risks is crucial to upholding medical ethics and maintaining public confidence in emerging healthcare technologies.

The details

The study, presented at the 2025 NeurIPS conference, was co-authored by MIT postdoc Sana Tonekaboni and Associate Professor Marzyeh Ghassemi. They found that foundation models trained on EHRs can 'memorize' individual patient records, potentially allowing adversaries to extract sensitive information. To assess this risk, the researchers developed a series of targeted tests to evaluate different levels of potential data leakage, ensuring even small breaches can be identified and mitigated.

  • The research paper was presented at the 2025 Conference on Neural Information Processing Systems (NeurIPS).
  • The study was conducted by MIT researchers in 2026.

The players

Sana Tonekaboni

A postdoc at the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard, and the first author of the paper.

Marzyeh Ghassemi

An MIT Associate Professor and principal investigator at the Abdul Latif Jameel Clinic for Machine Learning in Health, who leads the Healthy ML group focused on robust machine learning in healthcare.

Got photos? Submit your photos here. ›

What they’re saying

“Evaluating data leakage in a healthcare context is crucial to determining whether it compromises patient privacy.”

— Sana Tonekaboni, Postdoc at the Broad Institute of MIT and Harvard

“If an attacker needs to know specific details about a patient to extract information, the risk of harm is minimal. However, the study also highlights the vulnerability of patients with unique conditions, who are easier to identify and may require higher levels of protection.”

— Marzyeh Ghassemi, MIT Associate Professor

What’s next

The researchers plan to expand the work to become more interdisciplinary, involving clinicians, privacy experts, and legal experts to further address the risks of AI memorization and data leakage in healthcare.

The takeaway

This research underscores the critical need for robust testing and evaluation of clinical AI models to ensure patient privacy and data security are maintained, even when working with de-identified electronic health records. As AI becomes more integral to healthcare, upholding medical ethics and public trust must remain a top priority.