5 min

Hardware manufacturer Nvidia is the big winner of the AI revolution as it stands. As the maker of the GPUs that make everything from ChatGPT to GitHub Copilot and Google Gemini possible, its market value has soared above $2 trillion. The company isn’t sitting still, but Groq is positioning itself as a future competitor. What does the future of AI hardware look like?

Anyone who wants to run AI at scale is reliant on Nvidia. This is not only because that company has the hardware required, but also because its software platform (CUDA) has deep roots in the developer community. With virtually a monopoly in the data center world, Nvidia had been perfectly prepared for the sudden advance of generative AI (GenAI) in late 2022. Those who see this company’s sudden position of AI strength as a stroke of luck forget that the company saw exactly such an opportunity ages ago and after seizing the day, now refuses to let go.

Prediction seems to be coming true

Indeed, Nvidia has been betting since its founding in 1993 on a future of accelerated computing. This means, in short, that every workload requires optimal, purpose-built hardware, rather than a single CPU design that performs every conceivable computation. That prediction seems to be coming true with AI, currently powered by the world’s fastest GPUs.

Either way, the tech industry wants to continually move AI forward. They mean to do this not just by getting hold of hundreds of thousands of GPUs of an existing design, like Meta, but by sourcing hardware with better performance and creating more powerful models to run on them. Dell CEO Jeff Clarke recently stated that Nvidia will answer this demand with the so-called B200, a single card of which will already consume perhaps up to 1,000 watts at full utilization. Since AI training can require up to hundreds of these GPUs, sustainability appears to have been thrown out the window. Everything has to givee way for the fastest AI performance, aiming to deliver the lowest latency, the shortest development time and the best-functioning chatbots.

Change coming

We highlighted earlier that Nvidia actually has no rivals to speak of. The performance gains over competitors like Intel and AMD are so great that customers are willing to wait a long time for Nvidia products. Only in the area of inferencing, the daily running of an AI model that has already been trained, are there real alternatives. Indeed, the bulk of the processing time has already been done by then.

Nvidia’s dependence on inferencing will therefore gradually decrease. Microsoft recently struck a deal with Intel to produce its own ARM-based chips, possibly for AI deployment in data centers. It is obvious that Microsoft’s orders of Nvidia GPUs would eventually decline with such a move. Since Microsoft’s home base of Redmond is also focused on effectively downsizing AI models to reduce the computing power required, it looks like the dependence on Nvidia will not last forever.

Nvidia is ubiquitous

Similar moves are taking place at other tech companies: both AWS and Google Cloud, for example, have proprietary chips capable of running AI in the cloud. The latter has even been extremely progressive in this with its Tensor Processing Unit, which Google has been using internally since 2015. Its unique advantage over Nvidia hardware is that all architectural choices are geared toward AI workloads. However, as mentioned, Nvidia is ubiquitous in the data center world and for that reason was a more obvious initial home for GenAI as we know it today.

Actually, one would expect AI workloads to not be running on GPUs at all. They do, however, excel at parallel processing, or simultaneously completing numerous calculations with thousands of cores. This is in contrast to CPUs, which at most have just above 100 cores and therefore do not round out the countless AI calculations at a desirable rate when compared to the 18,432 cores found in an Nvidia H100 GPU. However, video cards have traditionally been designed for graphics applications, meaning there’s a lot of overhead not intended to assist AI workloads. In fact, this complexity is not necessary for each individual type of workload, but makes a GPU a “GPGPU”: a general purpose graphical processing unit. If you only want to run AI, all these added features are simply extra baggage.

Groq: the answer to GPUs

Meanwhile, the creator of Google’s Tensor Processing Unit (TPU) is now working elsewhere. At Groq, launched in 2016, CEO Jonathan Ross is trying to reinvent the wheel. In this case, Nvidia may eventually have cause for concern. Indeed, a week ago, Ross made a big splash with his new invention: the Language Processing Unit (LPU), which lets chatbots generate responses at lightning speed. In a speed test versus ChatGPT, the difference was huge. It should be noted that the Groq bot was running on Meta’s Llama 2 70B model, which is many times smaller than GPT-4 Turbo, the LLM behind ChatGPT.

The aforementioned advantage of TPUs once again rears its head with Groq’s chip. Powered by Tensor Stream Processors (TSPs), the LPU can directly perform the necessary AI calculations without the aforementioned overhead. It could simplify hardware requirements for large AI models, should Groq get beyond the public demo it recently brought out.

The answer is not 1000W per GPU

The fact that Nvidia is now targeting 1000W per GPU to run the most powerful AI should worry many. We must emphasize that Nvidia chips operate collectively in data centers, so the total Wattage soon reaches astronomical proportions. The current H100 GPU consumes at most 700W in certain configurations. Although the expected consumption of Nvidia hardware has time and again been lower than pre-launch rumours suggest, it could now be an accurate prediction. Indeed, Nvidia has more incentive than ever to push AI performance to the highest possible level. It sells every GPU it can provide and doesn’t have to pay the power bill afterwards.

Rushing virtually endless amounts of power and water through it is just not a sustainable initiative for the global economy. Hopes should be pinned on alternatives like Groq to realize AI workloads in the longer term. Nvidia has other interests.

CUDA software, the tool for developing AI, in addition to hardware, is crucial to Nvidia’s lone leadership position. Even if LPUs from Groq or accelerators from other parties break through, CUDA will continue to play an important role. Alternative GPUs from Intel and AMD, also aimed at the data center world, rely on translation layers toward CUDA to be attractive. But now that Nvidia has the AI industry in its grip, it refuses to let go. For that reason, the company has updated the CUDA terms of use: translation layers are no longer allowed.

We should therefore not be expecting a cooperative attitude from Nvidia when it comes to AI hardware democratization. It is up to parties like Groq to come up with a real alternative. The fact that Groq focuses on pure performance is a smart move. After all, mere efficiency as an argument is apparently not convincing enough to move away from Nvidia.