Beyond NVIDIA: Exploring AMD and Other GPU Alternatives for HPC/AI

In my last article, I discussed what a GPU was and primarily covered NVIDIAs history and product suite. While NVIDIA GPUs have long been the go-to choice for these AI workloads, there have been a host of alternative options, both GPU and other, simmering away, mostly behind the scenes.

I wanted to shine a light on these, the current state of the market, and where the future might be headed.

So let's get stuck in…

GPU based: AMD and Intel

AMD GPUs

AMD, a formidable competitor in the CPU market, and arguably has the lion share of the CPU sales in the HPC market due to core count density and power efficiencies, has also made significant strides in the GPU space with its "Radeon Instinct" series. These GPUs offer an appealing alternative to NVIDIA's dominance, particularly in certain niches. Key points to consider include:

Architecture: AMD's RDNA architecture focuses on energy efficiency and scalability, making it suitable for both gaming and professional applications.
Heterogeneous Compute: AMD GPUs support a variety of programming models, including OpenCL and ROCm, enabling developers to harness their power for parallel processing tasks. AMD also joined PyTorch Foundation to further development of PyTorch, a computational framework based on Python.
Datacenter Integration: With initiatives like the AMD CDNA architecture, AMD aims to carve out a space for itself in data centers, providing competition to NVIDIA in HPC and AI workloads.

Launching initially in 2017 with a 150W card "MI6", AMD recently announced their latest 750W, 192GB beast, "Instinct MI300X" in production later this year.

Disclaimer: comparing apples to apples is very difficult, workload dependent, and folks should always be skeptical of vendor-provided performance metrics.

Nvidia accused of cheating in big-data performance test by benchmark's umpires: Workloads 'tweaked' to beat rivals in TPCx-BB • The Register

For comparison, NVIDIA's H100 has 80GB memory, requiring NVLink to pool GPUs to address higher levels of memory. It will be very interesting to see developers using this new platform, as AMD has a significant memory bandwidth and capacity advantage, with TFLOPs/TOPS yet to be announced. As software is rapidly improving, you will likely see more AMD GPU's in the wild later this year.

Intel GPUs

Intel, renowned for its CPUs, has also entered the GPU arena with its Intel Xe architecture. These GPUs bring a new dynamic to the market, offering several unique aspects:

Integration: Intel GPUs are designed to work in synergy with Intel CPUs, potentially optimizing system-level performance in HPC and AI setups.
OneAPI: Intel's OneAPI initiative strives to provide a unified programming model across its various hardware components, including GPUs, CPUs, and FPGAs, simplifying the development process.
Xe HPC: Intel's Xe HPC GPUs target the high-performance computing market, competing directly with NVIDIA's Tesla series.

It's important to note that Intel is a relative newcomer to the high end GPU market, having almost exclusively offered 'good-enough' integrated graphics in laptops and desktop parts for decades. Their first-gen architecture, codenamed "Alchemist" has a SKU referred to as "Xe-HPC" was only seen in public two months back (Intel's Ponte Vecchio is Finally in The Wild | Tom's Hardware (tomshardware.com)) due to years of delays.

Intel's second-gen architecture, codenamed "Battlemage" has a SKU referred to as "Xe2-HPC" and may be released as "Realto Bridge" but according to recent reports, only likely based on "Enhanced Xe-HPC cores, not Xe2-HPC cores. In parallel, expectations for their consumer GPUs based on Battlemage have significantly tampered down (Intel rumoured to be scaling back its next-gen Battlemage GPU | PC Gamer) for NVIDIA's last-gen mid-range GPU (released early 2022), for a product not due for release until mid-2024.

Intel has a lot of work in front of them, and it appears that it will be many years to see IF they are able to bridge the gap at the high end of the market, or they remain in the low-mid performance (and cost) range.

Fringe Alternatives: ASICs (TPUs) and FPGAs

Application-Specific Integrated Circuits (ASICs)

ASICs are custom-designed chips tailored to perform a specific task exceptionally efficiently. In the context of HPC and AI, ASICs can be optimized for specific workloads, yielding substantial performance benefits:

Efficiency: ASICs excel in power efficiency and performance for their designated tasks, making them suitable for data-centric applications.
Challenges: Developing ASICs requires significant time, effort, and resources. They are not easily reprogrammable, limiting their flexibility for rapidly evolving workloads.

Amazon's AI platforms (Trainium and Inferentia) are powered by Nitro chips. Nitro is Amazon's custom ASIC (powered by their Annapurna acquisition) that offloads all kinds of tasks, including training and inferencing workloads. It's reported that every AWS server that ships comes with at least one Nitro chip.

Amazon EC2 server with an Annapurna ASIC (just above the purple handle)

Another alternative ASIC is what Google calls their TPU (Tensor Processing Units). Designed to address the unique demands of machine learning tasks, TPUs offer a specialized solution that are differentiated from traditional GPUs and other alternatives. Google's TPUs are trailblazers in AI acceleration, emphasizing performance, energy efficiency, and cloud-based accessibility.

Field-Programmable Gate Arrays (FPGAs)

FPGAs are reconfigurable hardware components that can be programmed to perform various tasks, offering a balance between flexibility and performance:

Customizability: FPGAs can be reprogrammed for different workloads, making them adaptable to changing requirements.
Parallelism: FPGAs excel at parallel processing, which is highly beneficial for certain AI and HPC tasks.
Learning Curve: Working with FPGAs often requires specialized expertise in hardware design and programming, potentially lengthening the development cycle.

FPGA's tend to be used in local, embedded solutions utilizing OpenCL, and don't tend to be used HPC-centric workloads. Think self-driving cars, medical imaging, and machine vision. Intel actually make a FPGA, called the Stratix 10 GX (released 2018) which achieves 143 INT8 TOPS at up to 225W, or around 1/2 of a last-gen AMD Instinct MI250X.

A Intel Stratix 10 GX FPGA Development Kit

In summary…

The HPC and AI landscape is evolving, and whilst the obvious choice for hardware accelerators has overwhelmingly NVIDIA GPUs, AMD specifically, is gaining traction with their GPUs, offering a competitive alternative. Intel is very early in their entry and more fringe options like ASICs and FPGAs bring unique advantages but also challenges related to customization, programming complexity, and development time.

As the demand for computational power continues to grow, understanding and exploring these alternatives will be crucial for making informed decisions in optimizing HPC and AI workloads.

Beyond NVIDIA: Is AMD the only GPU alternative for HPC/AI Workloads

GPU based: AMD and Intel

Fringe Alternatives: ASICs (TPUs) and FPGAs

In summary…

Oils ain't Oils

GPU's: What are they, where did they come from, why do I need one for AI?