Smart Factory
5 Minutes

Here come the Inferencing ASIC's

The tidal wave of Generative AI (GenAI) has mostly consisted of training large language models (LLM's), like GPT-4, and the huge amount of compute needed to process these enormous datasets, e.g. GPT-4 has 1.76 trillion parameters.

This compute has mainly looked like NVIDIA's GPUs, but you also need...

  1. power
  2. networking
  3. capital, AND
  4. a nice cool place to host them (data center)

The looooooong tail of AI Inferencing will dictate that compute is installed closer to where it's needed for latency sensitive use cases, needs to be more cost effective, and more efficient.

GPU's are great at Training and Inferencing workloads, however the demand for these chips has meant NVIDIA is able to price them high (an H100 is approximately $30k), with typical installations (10's to 100's of GPUs) running in the millions of dollars.

Given the high cost, it's no surprise that tech giants like Amazon, Google, Intel, Microsoft, Meta, and Tesla are developing their own silicon (based on ASICs) to enhance performance, efficiency, and scalability in AI applications.

Before exploring the specific ASIC innovations, it is helpful to understand how they compare to General Processing Units (GPUs), which are commonly used for AI tasks:

  • Design Purpose: ASICs are custom-built for specific tasks, such as AI inferencing, making them highly efficient for those operations. In contrast, GPUs are more versatile and are designed to handle a variety of computational tasks, including graphics rendering and scientific computations.

Want to Read More?

This premium content is exclusively available to our subscribers. Gain full access to this article and our entire library of industry insights.