To InfiniBand, maybe beyond?

Nvidia's latest roadmap was teased at Computex in Taiwan last month. Whilst details were a little light on PFLOPS and TDP for either the GPU or CPU, we did get some interesting information for the next-gen products.

  • GPU: Rubin (HBM3e to HBM4 memory) - TSMC 3N process

  • CPU: Vera (NVIDIA's 2nd gen ARM processor) - TSMC 3N process

  • Interconnect: NVLink6 (2x performance to 3600 GB/sec)

  • NIC: ConnectX9 (2x speed to 1.6Tb/sec)

  • Switch: SpectrumX1600 (2x speed to support CX9 NICs)

NVIDIA Rubin architecture roadmap


NVIDIA appear to have moved to a tick-tock approach to releases, something Intel famously developed before their own fabs got stuck on 14nm for 6 years (2016 to 2021).

Tick–tock was a production model adopted in 2007 by chip manufacturer Intel. Under this model, every microarchitecture change (tock) was followed by a die shrink of the process technology (tick). It was replaced by the process–architecture–optimization model, which was announced in 2016 and is like a tick–tock cycle followed by an optimization phase. As a general engineering model, tick–tock is a model that refreshes one side of a binary system each release cycle.

Every "tick" represented a shrinking of the process technology of the previous microarchitecture (sometimes introducing new instructions, as with Broadwell, released in late 2014) and every "tock" designated a new microarchitecture.[1] These occurred roughly every year to 18 months.[2] In 2014, Intel created a "tock refresh" of a tock in the form of a smaller update to the microarchitecture[3] not considered a new generation in and of itself.

In March 2016, Intel announced in a Form 10-K report that it deprecated the tick–tock cycle in favor of a three-step process–architecture–optimization model, under which three generations of processors are produced under a single manufacturing process, with the third generation out of three focusing on optimization.[4] The first optimization of the Skylake architecture was Kaby Lake. Intel then announced a second optimization, Coffee Lake,[5] making a total of four generations at 14 nm.[6]

Essentially a new architecture every 2 years, with a process improvement (node reduction, memory upgrade, both/other), they are calling Ultra, squeezed in every other year.

  • For Hopper, the H200 didn't get that nomenclature, however that would essentially be Hopper-Ultra for the memory improvements (141GB memory and 4.8 TB/sec bandwidth).

  • For Blackwell, the B200 will be Blackwell-Ultra and increases memory from 8Hi to 12Hi, so expect ~50% more memory and increases to bandwidth again

  • For Rubin, that moves to HMB4 and 8Hi memory, Rubin-Ultra increases that to 12Hi, and assume the similar 50% memory capacity, and bandwidth increases again.

Now, whilst most (including me!) are looking at Rubin and Vera, I noticed something about the networking side of things that doesn't appear to have gotten any coverage.  Let's look at that switch and network card…



SWITCHES

Ethernet - Spectrum-X800 (2024) @ 400G with BlueField3 DPU

Ethernet - Spectrum-X800 Ultra (2025) @ 800G with ConnectX8 NIC

IB/Ethernet - Spectrum-X1600 @ (2026) 1600G with ConnectX9 NIC

That last one is noteworthy.  There's no next-gen Quantum-2 or BlueField3.

Is NVIDIA converging their InfiniBand and Ethernet switches into one, and abandoning BlueField?

NETWORK CARDS

What happened to BlueField-3X and 4?

Another piece of the puzzle is that Jensen's presentation doesn't have a roadmap for the BlueField DPU beyond the current BlueField3, announced at GTC in 2021.

A little light research doesn't yield much for a next-gen BlueField, other than what Wikipedia expects (BlueField-4 @ 800G) https://en.wikipedia.org/wiki/Nvidia_BlueField however this slide from Patrick Kennedy at ServeTheHome shows that there is/was plans for a BlueField-3X and 4, however the speed was pegged at 400G with 'only' improvements to on-device processing.

NVIDIA DPU Roadmap 2022

https://www.servethehome.com/nvidia-shows-dpu-roadmap-combining-arm-cores-gpu-and-networking/

An updated slide from Dec 2023, apparently from NVIDIA, on Wccftech from Dec 2023, pushes BlueField and Quantum updates out to 2H 2024.

NVIDIA DPU Roadmap 2023

https://wccftech.com/nvidia-vera-rubin-next-gen-hpc-ai-gpu-architecture-2025/

Since we're midway through 2024, with an updated presentation sans DPU and dedicated InfiniBand switch, it's possible these have been abandoned for a Spectrum+ConnectX future...


Thanks for staying informed with our latest insights on Infrastructure as a Newsletter. You can also join the conversation on my podcast, Tech Insider, available on YouTube and wherever you get your pods from.

If you have any questions or would like to discuss solutions for your specific project, connect with me directly.

Contact Us Today

Previous
Previous

AI for real life

Next
Next

Apple, not Artificial, Intelligence