AWS and Cerebras Collaborate to Accelerate AI Inference in the Cloud

New solution combines AWS Trainium and Cerebras CS-3 to deliver fastest AI inference for generative AI and LLM workloads

Mar. 13, 2026 at 3:06pm

Got story updates? Submit your updates here. ›

Amazon Web Services (AWS) and Cerebras Systems have announced a collaboration to deliver the fastest AI inference solutions available for generative AI applications and large language model (LLM) workloads. The solution, to be deployed on Amazon Bedrock in AWS data centers, combines AWS Trainium-powered servers, Cerebras CS-3 systems, and Elastic Fabric Adapter (EFA) networking. The companies say the new offering will bring blisteringly fast inference to a global customer base by splitting the inference workload across Trainium and CS-3, with each system optimized for its specific computational needs.

Why it matters

Inference is a critical bottleneck for demanding AI workloads like real-time coding assistance and interactive applications. By combining AWS's purpose-built Trainium chip for parallel, computationally-intensive 'prefill' tasks and Cerebras' CS-3 system optimized for serial, memory-bandwidth intensive 'decode' operations, the new solution aims to deliver an order of magnitude faster and higher performance inference than what's available today.

The details

The Trainium + CS-3 solution enables 'inference disaggregation,' which separates AI inference into two stages: prompt processing ('prefill') and output generation ('decode'). Prefill is natively parallel and computationally intensive, while decode is inherently serial and memory bandwidth intensive. By optimizing each stage on the appropriate hardware - Trainium for prefill and Cerebras CS-3 for decode - and connecting them with high-speed EFA networking, the solution can maximize the performance of each part of the inference workflow.

The new solution will be available in the coming months, deployed on Amazon Bedrock in AWS data centers.
Later this year, AWS will also offer leading open-source LLMs and Amazon Nova using Cerebras hardware.

The players

Amazon Web Services (AWS)

An Amazon.com, Inc. company that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis.

Cerebras Systems

A technology company that builds the world's fastest AI infrastructure, including the Wafer Scale Engine 3 (WSE-3), the largest and fastest AI processor.

David Brown

Vice President, Compute & ML Services at AWS.

Andrew Feldman

Founder and CEO of Cerebras Systems.

Got photos? Submit your photos here. ›

What they’re saying

“Inference is where AI delivers real value to customers, but speed remains a critical bottleneck for demanding workloads like real-time coding assistance and interactive applications. What we're building with Cerebras solves that: by splitting the inference workload across Trainium and CS-3, and connecting them with Amazon's Elastic Fabric Adapter, each system does what it's best at. The result will be inference that's an order of magnitude faster and higher performance than what's available today.”

— David Brown, Vice President, Compute & ML Services, AWS

“Partnering with AWS to build a disaggregated inference solution will bring the fastest inference to a global customer base. Every enterprise around the world will be able to benefit from blisteringly fast inference within their existing AWS environment.”

— Andrew Feldman, Founder and CEO, Cerebras Systems

What’s next

Later this year, AWS will also offer leading open-source LLMs and Amazon Nova using Cerebras hardware.

The takeaway

This collaboration between AWS and Cerebras aims to set a new standard for AI inference speed and performance in the cloud, addressing a critical bottleneck for demanding generative AI and LLM workloads. By optimizing the inference process across specialized hardware, the solution promises to deliver an order of magnitude faster and higher performance inference capabilities to enterprises globally.

AWS and Cerebras Collaborate to Accelerate AI Inference in the Cloud

Why it matters

The details

The players

Amazon Web Services (AWS)

Cerebras Systems

David Brown

Andrew Feldman

What they’re saying

What’s next

The takeaway

Sunnyvale top stories

Sunnyvale Tech

About us

Resources

Contact Us

Our Services

Months

Upcoming

All Months

Gifts

Blog

Shopping Reviews

Gift Guides

Popular Holidays

About National Today