Immersion Datacenter Cooling: Future-Proofing
As chips and designs continue to push boundaries with ever higher Thermal Power Densities (TPDs), managing the dissipated heat becomes increasingly challenging.
Current air-cooled rack designs typically top out at around 10-20kW per rack. However, when you consider the Thermal Power Densities of CPUs and GPUs, it quickly becomes apparent that filling a rack with these heat-producing components becomes a limiting factor.
In a leaked Gigabyte roadmap, it has been revealed that most components are projected to double their power usage within the next 2-3 years. This power increase extends beyond CPUs to include networking components like higher bandwidth 400/800G switches, DPUs, FPGAs, and even optics.'
AMD's MI300X isn't listed here, but has increased power usage 34% from last gen, 560W to 750W.
While power consumption has historically been primarily a CPU concern, the rise of GPUs and their associated power requirements has accelerated the issue. Companies that offer denser solutions have traditionally prioritized power efficiency over performance. For instance, Ampere's Altra processors, which are based on ARM architecture (similar to that of smartphones and tablets), have typically operated at around 100W each, however they too are scaling up their power usage (and core count) up to 350W CPU chips with AmpereOne.
However, it's important to note that historically deploying these processors for enterprise and high-performance computing workloads, which are commonly run on x86 environments, often requires significant retooling. Despite their higher core counts and efficiency gains, Ampere has not yet achieved widespread market penetration.
Now, let's turn our attention to NVIDIA.
As of July 13, 2023, NVIDIA stands as a formidable player with a market value of $1.13 trillion. In a groundbreaking move, they have developed their own CPU based on ARM architecture, called Grace. Coupled with their Hopper GPU architecture, they have created a "superchip" capable of drawing up to 1000W of power.
This represents a significant leap in power consumption and subsequently the heat generated when compared to previous designs. NVIDIA's entrance into the CPU market with Grace and its pairing with Hopper GPUs signifies a major shift in the landscape of power draw and computational capabilities.
As the demand for more powerful and energy-intensive components continues to grow, datacenter operators and technology companies will face new challenges in managing heat dissipation. These developments highlight the need for innovative cooling solutions, such as immersion cooling, to effectively address the escalating power and thermal demands of advanced hardware configurations.
Let's take a look at the following examples (I've used Supermicro given my experience with them, certainly not because of their SKU naming convention!)
The latest 'standard' rack (2U dual processor)
Current A100 HGX
Latest generation H100 HGX
And then let's talk about a GH200-based solution...
NB. I have used absolute numbers when quoting power to keep things high-level and easy to understand, rather than actual/real-world power draws.
Standard Rack
PSU (90% efficiency): 1200W redundant (1+1)
Max CPU: Dual Socket, up to 350W each
GPU: 4x PCIe 5.0 slots
Rack Units: 2U
Model: SYS-221H-TNR
In this configuration, with 2x 350W CPUs, you are left with less than 400W of usable power. Consequently, the 4 available GPU slots can only support one A100 80GB GPU with a total board power of 300W. This leaves a therotical 100W for the rest of the system, such as additional IO, storage, RAM, fans, etc. For instance, a Bluefield DPU consumes around 75W per card.
It is also worth noting that Supermicro offers a D2C (Direct to Chip) Cold Plate as an optional extra for this server, which is an interesting feature.
A100 HGX
PSU (90% efficiency): 6000W redundant (2+2)
Max CPU: Dual Socket, up to 280W each
GPU: 9x PCIe 5.0 slots
Rack Units: 4U
Model: 4124GO-NART
With double the rack space and headroom for 4x power, using AMD's EPYC chips rated at 280W, you now have around 4500W available for 8x 300W GPUs operating at full power. This configuration also allows ample capacity for utilizing the 9th PCIe slot, as well as other resources like NVLink and NVSwitch.
H100 HGX
PSU (90% efficiency): 9000W redundant (3+3)
Max CPU: Dual Socket, up to 400W each
GPU: 9x PCIe 5.0 slots
Rack Units: 8U
Model: AS-8125GS-TNHR
Doubling the rack space to 8U, the CPU power envelope per CPU increases by 42%. This setup leaves approximately 7200W for 8x H100 GPUs, each using up to 700W of power. This allows for a theoretical 1600W for the system, representing a 20% reduction from the A100 unit.
GH200
NVIDIA's latest 'superchip,' the GH200, currently unable to be ordered by Supermicro, we can use the HGX reference design by NVIDIA.
It consists of two Grace Hopper blades housed in a 2U chassis. Each chip, combining a single 72-core ARM-based CPU and an H100 chip, can consume up to 1000W of power, thus 1U = 1000W just for the compute.
As a reminder, an air-cooled rack can give you 10-20kW, and an immersion tank can give you over 100kW!
Even through this narrow focus, these examples showcase the challenges and opportunities that arise as TPDs increase. Immersion cooling solutions offer a promising avenue for future-proofing datacenter infrastructures, enabling efficient heat management even in the face of demanding chips and designs.
Thank you, again, for joining me on this exploration of immersion cooling, and I look forward to future discussions and advancements in this exciting field.