EverMind Unveils MSA Architecture for 100M-Token Long-Term Memory in LLMs

Groundbreaking memory system delivers industry-leading performance at unprecedented scale

Mar. 19, 2026 at 7:04am

EverMind, a pioneer in AI memory infrastructure, has released a landmark research paper introducing a novel architecture called MSA (Memory Sparse Attention) that enables large language models to achieve efficient, end-to-end long-term memory at the unprecedented scale of 100 million tokens. Through a combination of innovative techniques, MSA overcomes the "Impossible Triangle" of LLM long-term memory and sets a new benchmark for scalability and precision.

Why it matters

This work represents a potential milestone in ushering in the new epoch of "Memory-as-a-Service", where memory can act as an independent, pluggable service freely combined with various reasoning cores (LLMs). It paints an exciting blueprint for the future development of the AI ecosystem, where user data and "memory assets" will no longer be locked into any single model or vendor.

The details

The MSA architecture is built on four key pillars of innovation: 1) Memory Sparse Attention, a differentiable, content-based sparsification mechanism that dynamically selects the most relevant memory subsets; 2) Document-wise RoPE, which decouples the internal relative position of a document from its absolute position in the global memory; 3) KV Cache Compression and Memory Parallel, an engineering solution that enables 100M-token inference on just two A800 GPUs; and 4) Memory Interleave, a mechanism that allows the model to perform multiple rounds of "generative retrieval → context expansion" loops for complex multi-hop reasoning.

  • On March 18, 2026, EverMind released the landmark research paper on MSA.
  • The paper is published on Zenodo and open-sourced on GitHub.

The players

EverMind

A pioneer in AI memory infrastructure and a core team deeply incubated by Shanda Group's founder, Tianqiao Chen, with the mission to conquer the long-term memory challenge of AI and move towards AI's Self-Evolving capability.

Shanda Group

The parent company of EverMind, with a strategic focus on building a "Discoverative AI" ecosystem that aims to enable AI to assist humans in discovering new knowledge and solving fundamental problems, rather than merely imitating and recombining existing information.

Got photos? Submit your photos here. ›

The takeaway

EverMind's MSA architecture represents a breakthrough in solving the long-standing challenge of long-term memory in large language models, paving the way for a new era of AI systems with true "lifelong memory" and the ability to engage in complex, multi-hop reasoning at unprecedented scale.