3 min Applications

Hyperscalers’ AI chip capacity is heavily underutilized

Hyperscalers’ AI chip capacity is heavily underutilized

Most of the AI capacity at major cloud providers remains unused. As a result, AWS, Microsoft and Google are missing out on billions of dollars in revenue, TechInsights analyst Owen Rogers asserts. Full utilization of AI hardware, however, is tough to achieve.

TechInsights estimates that with seven million GPU hours, AWS will have generated a revenue of about $5.8 billion through 2023. If the full capacity of AI hardware were utilized, it would amount to half of AWS’ 2023 revenue, Rogers argues. That would amount to $40 billion. This assumes that each AWS accelerator is present in a cluster of 20,000 units in each region and getting 100 percent usage.

According to Rogers, this hardware needs to be utilized more to be worthwhile. A small portion of the GPUs will be used by AWS, Microsoft and Google itself, respectively. However, that doesn’t even come close to accounting for the shortfall.

GPU consumption works differently than CPUs

According to Rogers, the reason this theoretical capacity hasn’t been reached has to do with how cloud resources are consumed. Customers tend to all make massive simultaneous use of AI hardware, requiring hyperscalers to provide capacity for peak usage.

However, there’s more to it than that, as GPUs can’t be used as a resource in the exact same way vCPUs can. They are typically run via a VM, as The Register notes. This means they’re often not running constantly as other dedicated hardware might.

AI workloads are also known to differ greatly from one another. Hyperscalers want to be able to deliver state-of-the-art performance for enterprise customers that may wish to quickly train their own AI models on enterprise data. While this process tends to take thousands of GPUs and takes dozens of days, ideally this is not too common an occurrence. After the training process is complete, the many times less demanding fine-tuning and inference workloads follow.

That last step, inferencing, is ultimately the workload that AI applications will most commonly use by a wide margin. While a chip like the H100 is a lot faster than alternative options in this regard, availability is a stumbling block. Because of the peaks and valleys in AI usage, the most powerful chips are simply not always available, which is why AWS and Google Cloud, for example, have launched scheduling services to provide access. Rogers points to this fact to emphasize that peak load on AI hardware creates the necessary over-provisioning.


According to Rogers, there is a way to actually utilize all these AI resources. According to him, the key is to adopt abstraction layers for AI development, such as Amazon SageMaker. In doing so, developers will not be working with GPUs themselves, but building LLMs will become software-mediated. A cloud provider like AWS will then determine what hardware is available for the task, allowing otherwise unused resources to be deployed.

Where a single H100 GPU would cost $98.32 per hour at AWS UltraScale clusters, its slower predecessor, the A100, is $40.96 per hour per GPU at the same cloud provider. That sounds attractive, but for inferencing tasks, the H100 is as much as 30 times faster than the A100. In other words, it’s pretty clear why providers charge a premium for the fastest offering.

Also read: Nvidia sees Huawei as a serious AI chip competitor