Nvidia Unveils Next-Gen Inference Chips, Shifts Production to Vera Rubin Platform

Nvidia to launch new AI inference chips at GTC 2026, reducing dependence on HBM memory

Mar. 30, 2026 at 7:33pm

A highly detailed, glowing 3D macro illustration of an Nvidia AI inference chip, featuring recognizable physical tech elements like circuit boards and heat sinks illuminated by neon cyan and magenta lights, conceptually representing the story's focus on next-generation AI hardware innovation.Nvidia's new Vera Rubin AI inference chip promises to reduce dependence on scarce memory resources, reshaping the future of large-scale AI deployment.San Jose Today

Nvidia has officially confirmed that it will launch a new generation of AI inference chips at the upcoming GTC 2026 conference. The company also announced a major production capacity adjustment, with the flagship Blackwell architecture product H200 gradually ceding production capacity to the next-generation Vera Rubin platform. This shift aims to reduce Nvidia's dependence on High-Bandwidth Memory (HBM) through architectural optimization, reshaping the pattern of AI computing hardware.

Why it matters

The new generation of inference chips from Nvidia are expected to be the core landing product of the Vera Rubin platform, optimized for scenarios such as long-text inference, multimodal model deployment, and AI Agent execution. By reducing HBM dependence, Nvidia can ease its own supply chain pressure and lower the cost of high-end computing hardware, promoting the wider adoption of AI large models.

The details

The Vera Rubin platform adopts a six-chip collaborative design, including new chips like the Rubin GPU, Rubin CPX inference-specific accelerator, and Vera CPU, manufactured using TSMC's 3nm N3P process. The platform's FP4 inference computing power reaches 50 Petaflops, 5 times that of the H200, and the inference Token cost can be reduced to one-tenth of the Blackwell platform. Nvidia has also achieved a breakthrough in reducing HBM dependence, with the third-generation Transformer Engine's built-in hardware-level adaptive compression technology and a hybrid memory architecture combining LPDDR5X and HBM4.

  • Vera Rubin platform will start small-batch shipments in the second quarter of 2026.
  • Vera Rubin platform will fully expand in the third and fourth quarters of 2026.

The players

Nvidia

An American multinational technology company that designs graphics processing units (GPUs) for the gaming and professional markets, as well as system on a chip units (SoCs) for the mobile computing and automotive market.

Jensen Huang

The founder and CEO of Nvidia.

Colette Kress

The Chief Financial Officer of Nvidia.

TSMC

Taiwan Semiconductor Manufacturing Company, the world's largest dedicated independent semiconductor foundry.

Got photos? Submit your photos here. ›

What they’re saying

“We will release 'unprecedented' new chips, focusing on three core directions: leapfrog inference performance, energy efficiency optimization, and supply chain resilience.”

— Jensen Huang, Founder and CEO of Nvidia

“Although the H200 has obtained a small number of export licenses, it has not generated actual revenue so far, and continuing large-scale mass production is no longer commercially meaningful.”

— Colette Kress, CFO of Nvidia

What’s next

Nvidia will officially announce the detailed parameters, pricing strategy, and launch timeline of the Vera Rubin platform at the GTC 2026 conference.

The takeaway

Nvidia's move to reduce HBM dependence through architectural innovation on the Vera Rubin platform will reshape the AI computing hardware landscape, promoting the wider adoption of AI large models by lowering the cost of high-end computing power.