Skip to Main Content
 

Major Digest Home AMD launches AI-targeted PCIe cards for current servers - Major Digest

AMD launches AI-targeted PCIe cards for current servers

AMD launches AI-targeted PCIe cards for current servers
Credit: Network World

AMD has launched the latest in its Instinct enterprise GPU accelerators, the MI350, which are designed to fit the data center infrastructure customers already own.

Targeted at agentic AI, Instinct MI350P PCIe cards are dual-slot drop-in cards for standard air-cooled servers. They are built to deploy inference on premises within customer’s current data center  power, cooling, and rack infrastructure.

The MI350P is AMD’s first PCIe-based Instinct accelerator in four years. It has traditionally made Instinct GPUs available as server-mounted OAM modules in a bundle of eight GPUs. This is a full height, full length PCIe card that can go in any 2U or larger design. It lets an enterprise customer gradually experiment with AI using just one card rather than eight GPUs, which is how AMD offers them typically.

Instinct MI350P PCIe cards are available in air-cooled systems with up to eight accelerator cards, which makes them ideal for small, medium, and large AI models for inference and RAG pipelines. It has 144GB of high bandwidth memory 3e (HBM3E) running at up to 4TB/s.

Performance is estimated at 2,299 teraflops (TFLOPS) and up to 4,600 peak TFLOPS at MXFP4, which AMD says is the highest performance currently available in an enterprise PCIe card. It offers native support for lower-precision MXFP6 and MXFP4, which deliver high throughput as well as acceleration through sparsity support for most mainstream 8- and 16-bit precisions.

The MI350P card supports technology called sparsity, where zero values in data sets and matrixes are ignored, thus reducing the processing time. Support for sparsity means higher precision formats, like INT8 and BF16, deliver efficient performance, according to AMD.

AMD says the Instinct MI350P is capable of handling around 200 to 250 billion parameter large language models per GPU and with support for up to eight GPUs per node, it can cover SLM / MLM / LLM inference / RAG workloads. It also supports all of the common ROCm open source software stack AMD offers with the other Instinct and Radeon products.

AMD did not give a launch date or a price for the MI350P.

Sources:
Published: