- Today
- Holidays
- Birthdays
- Reminders
- Cities
- Atlanta
- Austin
- Baltimore
- Berwyn
- Beverly Hills
- Birmingham
- Boston
- Brooklyn
- Buffalo
- Charlotte
- Chicago
- Cincinnati
- Cleveland
- Columbus
- Dallas
- Denver
- Detroit
- Fort Worth
- Houston
- Indianapolis
- Knoxville
- Las Vegas
- Los Angeles
- Louisville
- Madison
- Memphis
- Miami
- Milwaukee
- Minneapolis
- Nashville
- New Orleans
- New York
- Omaha
- Orlando
- Philadelphia
- Phoenix
- Pittsburgh
- Portland
- Raleigh
- Richmond
- Rutherford
- Sacramento
- Salt Lake City
- San Antonio
- San Diego
- San Francisco
- San Jose
- Seattle
- Tampa
- Tucson
- Washington
Open source Mamba 3 arrives to surpass Transformer architecture with nearly 4% improved language modeling, reduced latency
This release is good for developers building long-context applications, real-time reasoning agents, or those seeking to reduce GPU costs in high-volume production environments.
Mar. 17, 2026 at 11:04pm
Got story updates? Submit your updates here. ›
The generative AI era began with the launch of OpenAI's ChatGPT in late 2022, but the underlying Transformer neural network architecture dates back to Google's 2017 paper. While Transformers deliver unparalleled model quality, they are computationally expensive. Researchers have now released Mamba-3, an open-source language model that improves on Transformer architecture with nearly 4% better language modeling and reduced latency.
Why it matters
Mamba-3 represents a strategic shift in the total cost of ownership for AI deployments, doubling inference throughput for the same hardware footprint. As organizations move toward parallel, agentic workflows, the demand for low-latency generation increases, and Mamba-3 is designed to prevent GPU hardware from sitting idle during these tasks.
The details
Mamba-3 is a type of State Space Model (SSM) that maintains a compact, ever-changing internal state to process information quickly with lower memory requirements. It achieves comparable perplexity to its predecessor, Mamba-2, while using only half the state size. Mamba-3 introduces three key technological leaps: Exponential-Trapezoidal Discretization, Complex-Valued SSMs and the "RoPE Trick", and Multi-Input, Multi-Output (MIMO) to boost arithmetic intensity.
- Mamba-3 was released in March 2026.
The players
Albert Gu
A researcher at Carnegie Mellon University and one of the leaders behind the original Mamba architecture and the latest Mamba-3 release.
Tri Dao
A researcher at Princeton University and one of the leaders behind the original Mamba architecture and the latest Mamba-3 release.
What they’re saying
“We're quite happy with the final model design! The three core methodological changes are inspired by (imo) some elegant math and methods.”
— Albert Gu
The takeaway
Mamba-3 represents a strategic shift in the total cost of ownership for AI deployments, doubling inference throughput for the same hardware footprint. As organizations move toward parallel, agentic workflows, Mamba-3's design to prevent GPU hardware from sitting idle during these tasks makes it a valuable tool for developers building long-context applications and real-time reasoning agents.


