AMD introduces Instinct MI350P for drop-in enterprise AI hardware

AMD introduces Instinct MI350P for drop-in enterprise AI hardware

AMD is introducing the Instinct MI350P PCIe card. It is designed to make it easier for organizations to run AI workloads locally in existing data centers without requiring major changes to power supply, cooling, or rack infrastructure.

According to AMD, the new hardware targets organizations that need additional AI computing power but do not want to immediately invest in specialized GPU platforms, which often require modifications to data centers. The manufacturer positions the MI350P PCIe card as an intermediate step between traditional server hardware and large-scale AI infrastructure, intended for companies that want to run AI closer to their own infrastructure due to cost control, compliance requirements, or data privacy concerns.

The MI350P PCIe card is designed for standard air-cooled servers and existing racks. According to AMD, systems can be equipped with up to eight cards for inference workloads and retrieval augmented generation (RAG) applications using small, medium, and larger AI models.

AMD claims performance of up to 2,299 teraflops and peak values of up to 4,600 TFLOPS when using the MXFP4 precision format. Additionally, the company states that the card features 144 GB of HBM3E memory with a memory bandwidth of up to 4 TB/s. The hardware supports multiple AI precision formats, including FP8, MXFP8, MXFP4, INT8, and BF16. AMD also utilizes sparsity support to execute certain AI operations more efficiently and increase workload throughput.

According to AMD, the cards should also help keep power consumption and cooling requirements within existing data centers manageable. It is precisely the energy demands of AI infrastructure that currently pose an obstacle for many organizations in further expanding AI applications.

In the announcement, AMD emphasizes software and interoperability. The cards support, among other things, Kubernetes GPU Operator, AMD Inference Microservices, and AI frameworks such as PyTorch. According to AMD, this should enable organizations to migrate inference workloads with minimal code changes.

Lower operational costs

In addition, AMD is making an open-source enterprise AI reference stack available to partners at no licensing cost. According to the company, this approach should help reduce operational costs and make organizations less dependent on closed software environments or recurring licensing fees. AMD says this will allow companies to deploy AI systems on-premises more quickly without ongoing per-token costs.

A key aspect of the MI350P PCIe card’s positioning revolves around support for different precision levels within AI workloads. According to AMD, lower-precision formats such as MXFP4 and MXFP6 are primarily intended to deliver maximum performance during inference, while higher-precision formats such as INT8 and BF16 leverage sparsity acceleration to manage memory and computing power more efficiently.