GTC 2024 post-conference

Upon returning from GTC24, I've been able to reflect on all the new updates across NVIDIA's platforms and below is a summary of the various announcements.


Blackwell was the star of the show, with the B100, B200 and GB200 chips announced. Note that there were no consumer facing graphics cards (RTX) were named, nor was there a successor to the L40S or BlueField3 DPU (though there was a new ConnectX8 NIC).

As always though, a little bit of devil is in the details - as well as marketing waxing lyrically in the keynote. More on that at the end.


Let's dive in and see how next-gen (green), compares with current-gen (orange) and last-gen (red).

Upgrades to higher-speed HBM3e memory, and NVLink5, running on TSMC's updated 4NP lithography, not N3 like I expected.

Apple seems to have a lock on TSMC's N3 supply, as well as their N3E process, expected to debut in this year's iPhones, so NVIDIA are playing it safe with a node they have experience with, however huge generational increases in power usage as a result.

B100

The B100 chip is designed to be a drop-in replacement for H100 systems, complying with the same 700W TDP. I'm not sure how appealing this solution is...pulling out 1-2 year old ~$35k GPUs and replacing them with new ~$35k GPUs, with last-gen NVLink and InfiniBand limiting its capabilities.

B200

Fusing two dies together (presumably a close cousin of the B100), the B200 chips get 2-3x performance increase over the H100, with a 71% power increase, and a 40% regression at FP64, in part due to a 2nd-gen Transformer Engine. A move to N3 could minimized the power impact, as well as the literal size of the die.

GB200

The next-gen Superchip, is the current 1st generation Grace CPU, with the B200 alongside for a breathtaking 2700W power usage, or a 2.7x power increase. I would expect next gen Superchips (2025?) to include an updated CPU based on ARM's Neoverse v3 cores.


NVIDIA DGX

It's no surprise that given the power usage of the highest-end chips, putting these in racks in an air-cooled datacenter is a significant challenge. NVIDIA has designed reference architectures in the past, referred to as NVIDIA DGX.

During the keynote, Jensen talked about personally delivering the first DGX to OpenAI in 2016...

Elon Musk on X: "Some pics from when Jensen delivered the first @Nvidia AI system to @OpenAI

Fast forward 8 years later, you can see DGX grew exponentially...

DGX SuperPOD with DGX GB200 Systems | NVIDIA

Each liquid-cooled rack features 36 NVIDIA GB200 Grace Blackwell Superchips–36 NVIDIA Grace CPUs and 72 Blackwell GPUs–connected as one with NVIDIA NVLink. Multiple racks connect with NVIDIA Quantum InfiniBand to scale up to tens of thousands of GB200 Superchips.

On average, a standard rack in an air-cooled datacenter supports around 10kW, with peak usage around 20kW.

In a SuperPOD deployment, including InfiniBand networking, each rack needs 120kW of power! At that density, you cannot cool this with air. Period.

For reference, Jensen stated 25 degrees Celsius liquid in, 45 degrees Celsius liquid out @ 2 liters per second.

Disney imagineering robots on stage, super cute


Networking, Omniverse and NIM's!

For Networking, NVLink5 doubling the bandwidth, NVIDIA also updated their InfiniBand (Quantum) and Ethernet (Spectrum) solutions to 800Gb/s, and also updated their NVLink Switch architecture to get 4x bandwidth over Hopper, to 7.2TB/sec. ConnectX8 NIC's are also new.

Quite a lot there, I'll have to cover networking in a future newsletter.

Omniverse got a material update with the announcement of Omniverse Cloud APIs on Microsoft Azure.

NVIDIA also announcement a software acceleration solution, called NIMs (NVIDIA Inference Microservice) essentially democratizes access to pre-trained models, and can run on ANY NVIDIA GPU, whether in your datacenter, cloud or workstation. Runs CUDA? Runs NIMs.


Summing it all up

Whilst I thoroughly enjoyed the conference, especially meeting new folks with some long overdue catchups with my network, what I don't love is some of the marketing fluff that finds its way into presentations.

If you were at the conference at heard an audible groan on this slide, that was me 🙋🏻‍♂️

NVIDIA presented a comparison of GB200 (Grace+Blackwell) at FP4 to B200 (Blackwell) at FP8 to H200 at FP8 and called it a 30x improvement.

It is absolutely faster. Obviously.

Faster transformer engine, faster memory, and faster networking but what would actually a fair comparison, is comparing GB200 to GH200 at FP4 or FP8, which is absolutely NOT 30x.

Whilst NVIDIA did not disclose availability to purchase any of this new kit, it's mostly expected to be shipping in Q4 of this year. Looking forward to seeing how these things perform, and most anxiously, how they will be deployed around the world given the power and heat needs, and sustainability implications.

Previous
Previous

Here come the Inferencing ASIC's

Next
Next

GTC 2024 preview