AMD is launching three new graphics accelerators today which are part of the Radeon Instinct line up. These include the Vega 10 based Radeon Instinct MI25, the Fiji XT based Radeon Instinct MI8 and the Polaris 10 based Radeon Instinct MI6. The “MI” in the Instinct family branding stands for “Machine Intelligence” while the corresponding number is the total half precision compute output of the card itself.

Through our Radeon Instinct server accelerator products and open ecosystem approach, we’re able to offer our customers cost-effective machine and deep learning training, edge-training and inference solutions, where workloads can take the most advantage of the GPU’s highly parallel computing capabilities.

We’ve also designed the three initial Radeon Instinct accelerators to address a wide range of machine intelligence applications, which includes data-centric HPC-class systems in academics, government labs, energy, life science, financial, automotive and other industries via Radeon

The AMD Radeon Instinct MI25 accelerator is the fastest of the Instinct lineup. It features the Vega 10 graphics core with 4096 stream processors that are clocked at 1500 MHz. With these clock rates, the card delivers 24.6 TFLOPs of FP16, 12.3 TFLOPs of FP32 and 768 GFLOPs of FP64 compute that is aimed at deep learning tasks. The card also packs 16 GB of ECC HBM2 memory which delivers a total of 484 GB/s bandwidth.

It should be noted that the card is slightly lower clocked compared to the Vega Frontier Edition which packs a 1600 MHz clock rate and delivers 13 TFLOPs of FP32, 25 TFLOP of FP16 compute. AMD has said that the card delivers up to 82 GFLOPs/Watt FP16 and 41 GFLOPs/Watt FP32 peak GPU compute performance.

Highlights:

Industry Leading Performance for Deep Learning

Next-Gen “Vega” Architecture

Advanced Memory Engine

Large BAR Support for Multi-GPU Peer to Peer

ROCm Open Software Platform for Rack Scale

Optimized MIOpen Libraries for Deep Learning

MxGPU Hardware Virtualization

The Radeon Instinct MI25 accelerator, based on the new “Vega” GPU architecture with a 14nm FinFET process, will be the world’s ultimate training accelerator for large-scale machine intelligence and deep learning datacenter applications. The MI25 will deliver superior FP16 and FP32 performance in a passively-cooled single GPU server card with 24.6 TFLOPS of FP16 or 12.3 TFLOPS of FP32 peak performance through its 64 compute units (4,096 stream processors). With 16GB of ultra–high bandwidth HBM2 ECC GPU memory and up to 484 GB/s of memory bandwidth, the Radeon Instinct MI25’s design is optimized for massively parallel applications with large datasets for Machine Intelligence and HPC-class systems. via AMD

In addition to the specifications, the card comes in a dual slot, full height form factor. It requires dual 8-pin connectors to power and the TDP is rated at 300W. The card is passively cooled so it’s going to receive cooling from air inside large server racks. The card ships with a three year limited warranty.

AMD is also launching the Radeon Instinct MI8 accelerator which is designed as an inference card. The Instinct MI8 comes packed with the Fiji XT GPU that is based on the 28nm process. The GPU is housing the same number of cores as the Instinct MI25 which are 4096 in total but they are based on the older GCN revision and clocked much slower.

Highlights:

8.2 TFLOPS FP16 or FP32 Performance

Up To 47 GFLOPS Per Watt FP16 or FP32 Performance

4GB HBM1 on 512-bit Memory Interface

Passively Cooled Server Accelerator

Large BAR Support for Multi GPU Peer to Peer

ROCm Open Platform for HPC-Class Rack Scale

Optimized MIOpen Libraries for Deep Learning

MxGPU SR-IOV Hardware Virtualization

The Radeon Instinct MI8 accelerator, harnessing the high-performance, energy-efficiency of the “Fiji” GPU architecture, is a small form factor HPC and inference accelerator with 8.2 TFLOPS of peak FP16|FP32 performance at less than 175W board power and 4GB of High-Bandwidth Memory (HBM) on a 512-bit memory interface. The MI8 is well suited for machine learning inference and HPC applications. via AMD

In terms of specifications, the card features 4096 stream processors that are clocked at 1000 MHz. This delivers a rated compute output of 8.2 TFLOPs (FP16 / FP32) and 512 GFLOPs of FP64 compute at 1/16th rate. The card also features 4 GB of HBM1 memory which delivers 512 GB/s bandwidth. It is slightly faster than the Vega based Instinct MI25 accelerator but requires two more stacks and is more power hungry. AMD is rating the compute output of this card at 47 GFLOPs/Watt of FP16 and FP32 compute while FP64 compute is rated at 2.9 GFLOPs/Watt.

The card comes in the same small, dual slot package as the Radeon R9 Nano. It has a rated TDP of 175W and power is provided through a single 8-pin connector. The card also lacks active cooling since it’s aimed at servers.

Lastly, we have the AMD Radeon Instinct MI6 graphics accelerator. This card packs the Polaris 10 core and is aimed at both Deep Learning and Inferencing workloads. In terms of specifications, the chip packs the complete 2304 stream processors. All cores are clocked at 1237 MHz. At the rated clock speeds, the chip delivers 5.7 TFLOPs (FP16 / FP32) compute and 358 GFLOPs of dual precision compute performance.

AMD has rated the single and half precision throughput of this card at 2.4 GFLOPs/Watt while the dual precision compute throughput is rated at 358 GFLOPs/Watt.

Highlights:

5.7 TFLOPS FP16 or FP32 Performance

Up To 38 GFLOPS Per Watt Peak FP16 or FP32 Performance

16GB Ultra-Fast GDDR5 Memory on 256-bit Memory Interface

Passively Cooled Server Accelerator

Large BAR Support for Multi-GPU Peer to Peer

ROCm Open Platform for HPC-Class Scale Out

Optimized MIOpen Libraries for Deep Learning

MxGPU SR-IOV Hardware Virtualization

The Radeon Instinct MI6 accelerator, based on the acclaimed “Polaris” GPU architecture, is a passively cooled inference accelerator with 5.7 TFLOPS of peak FP16|FP32 performance at 150W board power and 16GB of ultra-fast GDDR5 GPU memory on a 256-bit memory interface. The MI6 is a versatile accelerator ideal for HPC and machine learning inference and edge-training deployments. via AMD

The card also comes with 16 GB of GDDR5 memory clocked at 7000 MHz along a 256-bit wide bus interface. This delivers up to 224 GB/s of bandwidth on the card. The card comes in a single slot, full length form factor and is passive cooled with air coming in from the large server arrays. TDP on the card is set at 150W so power is provided by a single 6-pin connector.

AMD Radeon Instinct Accelerators:

Accelerator Name

AMD Radeon Instinct MI6

AMD Radeon Instinct MI8

AMD Radeon Instinct MI25

GPU Architecture

Polaris 10

Fiji XT

Vega 10

GPU Process Node

14nm FinFET

28nm

14nm FinFET

GPU Cores

2304

4096

4096

GPU Clock Speed

1237 MHz

1000 MHz

1500 MHz

FP16 Compute

5.7 TFLOPs

8.2 TFLOPs

24.6 TFLOPs

FP32 Compute

5.7 TFLOPs

8.2 TFLOPs

12.3 TFLOPs

FP64 Compute

384 GFLOPs

512 GFLOPs

768 GFLOPs

VRAM

16 GB GDDR5

4 GB HBM1

16 GB HBM2

Memory Clock

1750 MHz

500 MHz

472 MHz

Memory Bus

256-bit bus

4096-bit bus

2048-bit bus

Memory Bandwidth

224 GB/s

512 GB/s

484 GB/s

Form Factor

Single Slot, Full Length

Dual Slot, Half Length

Dual Slot, Full Length

Cooling

Passive Cooling

Passive Cooling

Passive Cooling

TDP

150W

175W

300W

Planned for June 29th rollout, the ROCm 1.6 software platform with performance improvements and now support for MIOpen 1.0 is scalable and fully open source providing a flexible, powerful heterogeneous compute solution for a new class of hybrid Hyperscale and HPC-class systems.

Comprised of an open-source Linux driver optimized for scalable multi-GPU computing, the ROCm software platform provides multiple programming models, the HIP CUDA conversion tool, and support for GPU acceleration using the Heterogeneous Computing Compiler (HCC). AMD also showcased several server racks from their partners that utilized the new EPYC 7000 series processors and Instinct MI25 accelerators.