AMD is catching up to Intel on many fronts, whether it’s chips intended for laptops, desktops or data centers. What is the secret behind the AMD advance, with the Epyc line for servers in particular? And how does it fit into the company’s larger portfolio? We discussed it with Alexander Troshin, Product Marketing Manager – EMEA at AMD’s Enterprise & HPC Server Business Unit.
Troshin is clear: For just about every use case, AMD has a processor at the ready. From the embedded world on factory floors to offices and hyperscale data centers, an AMD Epyc or Ryzen can be placed anywhere. The secret behind this wide applicability is Zen, the chip architecture that serves a wide variety of domains.
Chiplets for everyone
AMD’s resurgence has been years in the making, with Zen being the main driver. Since 2017, the company has offered the Ryzen and Epyc series with this common architecture. Ryzen is intended for desktops and laptops, Epyc is traditionally the server-based chip series.
Each Zen-based processor consists of chiplets, also called core complexes or CCDs in AMD vernacular. The main advantage of chiplets is that they make processors enormously scalable. Indeed, AMD nodes several CCDs together as desired, so that a Ryzen chip can run 4, 6, 8, 12 or 16 cores, while the server-focused Epyc series scales further to 32, 64, 96 or even 128 cores on a single processor. In a way, it is reminiscent of Lego, were it not for the fact that its assembly scale is so small that only ASML machines operating at nanometers are accurate enough to assemble the structure.
An Epyc chip needn’t be that large either. May’s Epyc 4004 series launch was notable for featuring an Epyc that doesn’t look like a server chip at all. In fact, an Epyc 4004 chip features the exact same form factor and AM5 socket as Ryzen processors for desktops. They do offer more functionality than Ryzen. In addition, Epyc 4004 chips have had to go through many more validation steps. These include support for software RAID, which duplicates data to prevent data loss, or PCIe hotswapping that allows SSDs or other peripherals to be connected without a reboot. In addition, Epyc 4004 officially supports ECC (Error Correction Code) DRAM, or safer system memory than consumer DRAM. This series of chips scales up to 16 cores and 32 threads, but also includes an entry-level chip with only 4 cores and 8 threads. Epyc 4004 is intended for organizations with relatively humble (but still meaningful) IT requirements. Think of SMBs with a few retail branches, for example, or organizations that want to run an e-mail server or CRM system and that’s it.
Troshin explains that these types of customers don’t need all the features of Epyc at all. “Sometimes people just want to place ads on store screens.” So deploying Epyc can be that simple, but even then it is important to offer the most reliable product possible, for example, with ECC memory. Still, the fact is that when you think of Epyc, you tend to think of large, roaring servers with dozens of cores simultaneously sifting through hefty HPC workloads. According to Troshin, the deployability on both large and small scales precisely reflects the strength of AMD’s offerings. “We designed Zen from the beginning to work with many different applications,” he said. The success of that approach has long since been demonstrated.
Support from the market
The fact that AMD is gaining traction since the shift to Zen is also clear. On desktops, its global market share is now nearly 24 percent, while one in five laptops are running an AMD chip today. That used to be quite different: Intel is estimated to have once laid claim to 90 percent of both markets. Still, Intel chips are the norm when it comes to client PCs, but AMD’s impressive benchmarks and generally attractive prices have certainly helped it find an audience. Whether the company can ever truly get hot on Intel’s heels again like around 2005 remains to be seen.
In the server market, the proportions are somewhat different. It is true that there, too, the market share is almost 25 percent, but AMD today takes in a third of the total revenue from the server market. This is significant: customers are evidently willing to put down more money for an AMD Epyc than for an Intel Xeon. Industry support is absolutely necessary for Epyc’s success, so the company’s market position depends in large part on good relationships with OEMs that market the data center products. OVHcloud, for example, has been supplying the Epyc 4004 since it was introduced in April, as have Lenovo and Supermicro. More and more of these OEMs are offering AMD products. It’s a success story that for now won’t be repeated in the area of laptops in particular, where Intel products dominate the online stores of this world.
The bottom line, as Troshin sums it up, is this: “From the perspective of consumers, partners and spread across the ecosystem, we own one-third of the global market.” There is certainly still room for growth, but AMD escaped from its troubles from about a decade ago by developing Zen and executing on a continued rollout of new products. Meanwhile, Intel dawdled considerably, with sparse annual improvements, whether in client or server products. AMD took advantage and caught up with Intel in the stock market as well. Today, the company values are hardly comparable; $251 billion for AMD and $92 billion for Intel, at the time of writing.
Open standards
No tech discussion in 2024 can avoid AI. In terms of hardware, there is only one player the average IT professional will immediately think of: Nvidia. For a while, the company’s stock market value exceeded that of any other, driven by the total dominance it has when it comes to GPUs in data centers. These chips run the heaviest AI workloads such as training and fine-tuning models.
AMD offers a counterpart to Nvidia’s unsurpassed H100 GPUs. The Radeon Instinct MI300 series is not the focus of our discussion with Troshin, but within the larger AMD story, they are indispensable. CEO Lisa Su expressed high hopes late last year about the market impact AMD can make with the MI300. The positive stock market leads are as much due to the division building data center GPUs as the Epyc and Ryzen teams.
Troshin again emphasizes AMD’s scalability, including in the AI area. Recently, Ryzen chips with an onboard NPU have also become available for end users. These NPUs are intended to make the much-discussed leap to the AI PC and meet Microsoft’s requirements for a Copilot+ PC. Like Intel, AMD firmly believes in the opportunities of AI for desktop and laptop users, for everything from personalized AI chatbots to AI graphics and AI daily scheduling. These client processors that are increasingly showing up with an NPU will play an important role in the AI infrastructure that has to eventually appear. Not everything will run in the cloud, as that is simply not affordable, cost-effective or desirable for privacy reasons.
One factor should not be underplayed in this AI story, Troshin argues, and that is networking. Networking is essential to supply GPUs with (enterprise) data and to connect different processors together. Training the largest AI models, for example, requires thousands of GPUs. These, in turn, require a hefty, fast networking infrastructure.
Open or closed?
Nvidia is focusing on their own interpretation of networking: NVLink and NVSwitch. These technologies allow numerous GPUs to function as if they were a single processor. Virtually all other players on the market have an alternative available that will be familiar to many: Ethernet. AMD, Arista, Broadcom, Cisco, Eviden, Intel, Meta, Microsoft and Oracle are all members of the Ultra Ethernet Consortium (UEC). It is the organization in which these parties emphasize the flexibility, scalability and power of the ubiquitous Ethernet standard. Various working groups focus on reducing latency, supporting compliance and more.
Troshin speaks of a balanced approach. In fact, the UEC is about more than just Ethernet cables and their distribution in data centers. For example, there is a storage working group that is building mechanisms to make HPC workloads interact with storage faster. “A fast CPU or GPU won’t help if your storage is too slow and vice versa,” his explanation reads. A mature, global AI infrastructure requires faster and harmoniously interacting CPUs, SSDs, networking AND GPUs. Anything can be a bottleneck depending on the scenario. Ethernet is an important piece of the puzzle, because without a well thought out networking medium, you have no scalability, but ultimately it is only one component among many.
Optimizations
Optimizations are still necessary, AMD also knows. For AI, the ROCm stack is key. It (like CUDA at Nvidia) provides the optimizations needed to run libraries as fast as possible on AMD hardware. Every Radeon GPU works with it. That includes giant data center GPUs but also Radeon onboard graphics in thin-and-light laptops. Developers can therefore scale their AI workloads from small to large. In fact, the support is as imposing as Nvidia’s, with deep links to Hugging Face, PyTorch and TensorFlow, among others. “We want to make sure the out-of-the-box experience, the deployment experience and the speed of it is seamless,” Troshin said.
AMD cannot complete that mission alone, again recently. It acquired European player Silo AI for 665 million euros. That player describes itself as “Europe’s largest private AI lab.” This means that it helps companies realize AI products, in all kinds of industries. The acquisition is a step for AMD to further expand its own enterprise AI portfolio. AMD will thus innovate more strongly with industry players to build smart devices, advanced cars and Industry 4.0 applications. It already does that, but wants to do it better. One practical example of work AMD has already done is a deal with Subaru. The Japanese car brand chose AMD chips to power its EyeSight system, its own AI-powered driving aid. In the process, we’ve found AMD chips powering much more than just the desktops, laptops and servers we would normally associate with them. It looks like we can expect more such developments.
Also read: AMD launches AI tool for image generation on your PC