- Today
- Holidays
- Birthdays
- Reminders
- Cities
- Atlanta
- Austin
- Baltimore
- Berwyn
- Beverly Hills
- Birmingham
- Boston
- Brooklyn
- Buffalo
- Charlotte
- Chicago
- Cincinnati
- Cleveland
- Columbus
- Dallas
- Denver
- Detroit
- Fort Worth
- Houston
- Indianapolis
- Knoxville
- Las Vegas
- Los Angeles
- Louisville
- Madison
- Memphis
- Miami
- Milwaukee
- Minneapolis
- Nashville
- New Orleans
- New York
- Omaha
- Orlando
- Philadelphia
- Phoenix
- Pittsburgh
- Portland
- Raleigh
- Richmond
- Rutherford
- Sacramento
- Salt Lake City
- San Antonio
- San Diego
- San Francisco
- San Jose
- Seattle
- Tampa
- Tucson
- Washington
Sunnyvale Today
By the People, for the People
AWS and Cerebras Collaborate to Accelerate AI Inference in the Cloud
New solution combines AWS Trainium and Cerebras CS-3 to deliver fastest AI inference for generative AI and LLM workloads
Mar. 13, 2026 at 3:06pm
Got story updates? Submit your updates here. ›
Amazon Web Services (AWS) and Cerebras Systems have announced a collaboration to deliver the fastest AI inference solutions available for generative AI applications and large language model (LLM) workloads. The solution, to be deployed on Amazon Bedrock in AWS data centers, combines AWS Trainium-powered servers, Cerebras CS-3 systems, and Elastic Fabric Adapter (EFA) networking. The companies say the new offering will bring blisteringly fast inference to a global customer base by splitting the inference workload across Trainium and CS-3, with each system optimized for its specific computational needs.
Why it matters
Inference is a critical bottleneck for demanding AI workloads like real-time coding assistance and interactive applications. By combining AWS's purpose-built Trainium chip for parallel, computationally-intensive 'prefill' tasks and Cerebras' CS-3 system optimized for serial, memory-bandwidth intensive 'decode' operations, the new solution aims to deliver an order of magnitude faster and higher performance inference than what's available today.
The details
The Trainium + CS-3 solution enables 'inference disaggregation,' which separates AI inference into two stages: prompt processing ('prefill') and output generation ('decode'). Prefill is natively parallel and computationally intensive, while decode is inherently serial and memory bandwidth intensive. By optimizing each stage on the appropriate hardware - Trainium for prefill and Cerebras CS-3 for decode - and connecting them with high-speed EFA networking, the solution can maximize the performance of each part of the inference workflow.
- The new solution will be available in the coming months, deployed on Amazon Bedrock in AWS data centers.
- Later this year, AWS will also offer leading open-source LLMs and Amazon Nova using Cerebras hardware.
The players
Amazon Web Services (AWS)
An Amazon.com, Inc. company that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis.
Cerebras Systems
A technology company that builds the world's fastest AI infrastructure, including the Wafer Scale Engine 3 (WSE-3), the largest and fastest AI processor.
David Brown
Vice President, Compute & ML Services at AWS.
Andrew Feldman
Founder and CEO of Cerebras Systems.
What they’re saying
“Inference is where AI delivers real value to customers, but speed remains a critical bottleneck for demanding workloads like real-time coding assistance and interactive applications. What we're building with Cerebras solves that: by splitting the inference workload across Trainium and CS-3, and connecting them with Amazon's Elastic Fabric Adapter, each system does what it's best at. The result will be inference that's an order of magnitude faster and higher performance than what's available today.”
— David Brown, Vice President, Compute & ML Services, AWS
“Partnering with AWS to build a disaggregated inference solution will bring the fastest inference to a global customer base. Every enterprise around the world will be able to benefit from blisteringly fast inference within their existing AWS environment.”
— Andrew Feldman, Founder and CEO, Cerebras Systems
What’s next
Later this year, AWS will also offer leading open-source LLMs and Amazon Nova using Cerebras hardware.
The takeaway
This collaboration between AWS and Cerebras aims to set a new standard for AI inference speed and performance in the cloud, addressing a critical bottleneck for demanding generative AI and LLM workloads. By optimizing the inference process across specialized hardware, the solution promises to deliver an order of magnitude faster and higher performance inference capabilities to enterprises globally.

