- Today
- Holidays
- Birthdays
- Reminders
- Cities
- Atlanta
- Austin
- Baltimore
- Berwyn
- Beverly Hills
- Birmingham
- Boston
- Brooklyn
- Buffalo
- Charlotte
- Chicago
- Cincinnati
- Cleveland
- Columbus
- Dallas
- Denver
- Detroit
- Fort Worth
- Houston
- Indianapolis
- Knoxville
- Las Vegas
- Los Angeles
- Louisville
- Madison
- Memphis
- Miami
- Milwaukee
- Minneapolis
- Nashville
- New Orleans
- New York
- Omaha
- Orlando
- Philadelphia
- Phoenix
- Pittsburgh
- Portland
- Raleigh
- Richmond
- Rutherford
- Sacramento
- Salt Lake City
- San Antonio
- San Diego
- San Francisco
- San Jose
- Seattle
- Tampa
- Tucson
- Washington
Sunnyvale Today
By the People, for the People
AWS and Cerebras Collaboration Aims to Set a New Standard for AI Inference Speed and Performance in the Cloud
Deployed in AWS data centers and accessed through Amazon Bedrock, AWS Trainium + Cerebras CS-3 solution will accelerate inference speed
Mar. 13, 2026 at 7:05pm
Got story updates? Submit your updates here. ›
Amazon Web Services (AWS) and Cerebras Systems have announced a collaboration that will deliver the fastest AI inference solutions available for generative AI applications and large language model (LLM) workloads. The solution, to be deployed on Amazon Bedrock in AWS data centers, combines AWS Trainium-powered servers, Cerebras CS-3 systems, and Elastic Fabric Adapter (EFA) networking. This innovative integrated system will provide unmatched performance and speed for AI inference by splitting the inference workload across Trainium and CS-3, with each system optimized for its specific computational needs.
Why it matters
Inference is a critical bottleneck for demanding AI workloads like real-time coding assistance and interactive applications. The collaboration between AWS and Cerebras aims to solve this issue by delivering the fastest AI inference solutions available, which will enable enterprises around the world to benefit from blisteringly fast inference within their existing AWS environment.
The details
The Trainium + CS-3 solution enables 'inference disaggregation,' a technique which separates AI inference into two stages: prompt processing, or 'prefill,' and output generation, or 'decode.' Trainium is optimized for the prefill stage, which is natively parallel and computationally intensive, while the Cerebras CS-3 is optimized for the decode stage, which is inherently serial and memory bandwidth intensive. By strategically disaggregating the inference problem, the two different computational challenges can be optimized in a specialized way, resulting in significantly faster inference performance.
- The new solution will be launched in the coming months and made available through Amazon Bedrock.
- Later this year, AWS will also offer leading open-source LLMs and Amazon Nova using Cerebras hardware.
The players
Amazon Web Services (AWS)
An Amazon.com, Inc. company that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis.
Cerebras Systems
A company that builds the fastest AI infrastructure in the world, including the Wafer Scale Engine 3 (WSE-3), the world's largest and fastest AI processor.
David Brown
Vice President, Compute & ML Services at AWS.
Andrew Feldman
Founder and CEO of Cerebras Systems.
What they’re saying
“Inference is where AI delivers real value to customers, but speed remains a critical bottleneck for demanding workloads like real-time coding assistance and interactive applications. What we're building with Cerebras solves that: by splitting the inference workload across Trainium and CS-3, and connecting them with Amazon's Elastic Fabric Adapter, each system does what it's best at. The result will be inference that's an order of magnitude faster and higher performance than what's available today.”
— David Brown, Vice President, Compute & ML Services, AWS
“Partnering with AWS to build a disaggregated inference solution will bring the fastest inference to a global customer base. Every enterprise around the world will be able to benefit from blisteringly fast inference within their existing AWS environment.”
— Andrew Feldman, Founder and CEO of Cerebras Systems
What’s next
Later this year, AWS will also offer leading open-source LLMs and Amazon Nova using Cerebras hardware.
The takeaway
The collaboration between AWS and Cerebras aims to set a new standard for AI inference speed and performance in the cloud, addressing a critical bottleneck for demanding AI workloads and enabling enterprises worldwide to benefit from blazingly fast inference within their existing AWS environment.

