Oracle announces first zettascale supercomputer during CloudWorld. The maximum configuration uses no fewer than 131,072 Nvidia Blackwell GPUs.
The fact that the cloud offers significant scaling opportunities has been known for a long time. Today, however, Oracle is giving this feature a new dimension in Oracle Cloud Infrastructure (OCI). Starting today, customers can place orders for OCI Superclusters with a huge maximum capacity. The more than 130,000 maximum GPUs that can be plugged into it will deliver a peak performance of 2.4 zetaflops.
Let’s put the large number of GPUs available in OCI Supercluster into perspective. It is three times more GPU’s than in the Frontier supercomputer, which currently ranks first in the Top 500 list of supercomputers. That contains “only” about 38,000 AMD Instinct GPUs. As such, that’s an exascale supercomputer, not zettascale. Maximum performance peaks at just over 1,200 exaflops. With this new offering, Oracle exceeds that by a wide margin. Oracle does not comment on performance per watt in its official press release. We always find that an interesting figure.
Different variants of OCI Supercluster
OCI Supercluster consists of OCI Compute Bare Metal, the various components talk to each other via RoCEv2 or Nvidia Quantum-2 Infiniband and, of course, there is also storage suitable for High Performance Computing (HPC).
Customers can order OCI Supercluster in several variants. It is available with Nvidia H100 or H200 Tensor Core GPUs, but also with the latest Blackwell GPUs. If a customer goes for H100s, OCI Supercluster scales up to 16,384 GPUs and 65 exaflops. With H200s, the upper limit is 65,536 GPUs and 260 exaflops. Nvidia’s latest hardware will be available in OCI Superclusters through the Nvidia GB200 NVL72 “superchip,” a combination of Grace-CPU and Blackwell-GPU. These are liquid-cooled bare-metal instances, which, as the name implies, communicate with each other via NVLink and NVLink Switch. Within a single NVLink domain, up to 72 Blackwell GPUs can communicate with each other in this way. The combined bandwidth is then just under 130 TB/s.
The Nvidia Blackwell GPUs are not available right now, Nvidia reportedly hit the pause button recently recently due to a design flaw (although the company recently disputed this). They are expected in the first half of 2025 for OCI Supercluster. It is also not clear whether Oracle already takes orders for the maximum number of 131,072 GPUs from the start. It would not surprise us at all that an OCI Supercluster with this number of GPUs mainly looks pretty on the spec sheet for the time being and that Oracle will not deliver this yet. We have asked Oracle for comment. As soon as we have more information on this, we will update this article. Furthermore, it is also not clear where Oracle is building or will build these new huge OCI Superclusters.