Open source Mamba 3 arrives to surpass Transformer architecture with nearly 4% improved language modeling, reduced latency

This release is good for developers building long-context applications, real-time reasoning agents, or those seeking to reduce GPU costs in high-volume production environments.

Mar. 17, 2026 at 11:04pm

The generative AI era began with the launch of OpenAI's ChatGPT in late 2022, but the underlying Transformer neural network architecture dates back to Google's 2017 paper. While Transformers deliver unparalleled model quality, they are computationally expensive. Researchers have now released Mamba-3, an open-source language model that improves on Transformer architecture with nearly 4% better language modeling and reduced latency.

Why it matters

Mamba-3 represents a strategic shift in the total cost of ownership for AI deployments, doubling inference throughput for the same hardware footprint. As organizations move toward parallel, agentic workflows, the demand for low-latency generation increases, and Mamba-3 is designed to prevent GPU hardware from sitting idle during these tasks.

The details

Mamba-3 is a type of State Space Model (SSM) that maintains a compact, ever-changing internal state to process information quickly with lower memory requirements. It achieves comparable perplexity to its predecessor, Mamba-2, while using only half the state size. Mamba-3 introduces three key technological leaps: Exponential-Trapezoidal Discretization, Complex-Valued SSMs and the "RoPE Trick", and Multi-Input, Multi-Output (MIMO) to boost arithmetic intensity.

  • Mamba-3 was released in March 2026.

The players

Albert Gu

A researcher at Carnegie Mellon University and one of the leaders behind the original Mamba architecture and the latest Mamba-3 release.

Tri Dao

A researcher at Princeton University and one of the leaders behind the original Mamba architecture and the latest Mamba-3 release.

Got photos? Submit your photos here. ›

What they’re saying

“We're quite happy with the final model design! The three core methodological changes are inspired by (imo) some elegant math and methods.”

— Albert Gu

The takeaway

Mamba-3 represents a strategic shift in the total cost of ownership for AI deployments, doubling inference throughput for the same hardware footprint. As organizations move toward parallel, agentic workflows, Mamba-3's design to prevent GPU hardware from sitting idle during these tasks makes it a valuable tool for developers building long-context applications and real-time reasoning agents.