NVIDIA NVSwitch : A Look Inside DGX-2, the “World’s Largest GPU”

Nvidia has given out some more details about its NVSwitch technology at Hot Chips 2018. NVSwitch is one of the key features of the DGX-2 server. The DGX-2 was introduced by Nvidia CEO Jensen Huang as the “World’s Largest GPU” at GTC 2018. The DGX-2 has 16 Tesla V100 GPUs integrated together as one large GPU using a new GPU interconnect.

The specifications of this DGX-2 server are crazy, and with its price tag of $400,000, crazy is what you should expect. Though the price is very steep in the eyes of an average user, it’s actually quite competitive in comparison to the prices of other servers solutions from different companies in this segment.

AI is one of the hottest topics currently and new innovations come up almost every week. Most of these new technologies are much larger and more complex than their predecessors and require even more memory capacity and computational power.

Nvidia has made this DGX-2 server possible by tying together 16 Tesla V100 GPUs together with a universal memory interface and a massive 512GB of HBM2 memory. The DGX-2 contains 81,920 CUDA cores and an additional 12,240 Tensor cores chip for AI capabilities. All this is powered using a 10kW power source.

The Tesla V100 used in the DGX-2 is, in itself, a very powerful unit. It contains 21 billion transistors and has a die size of 815mm2. It runs at 350 W which is a 50 W higher than the previous model and has 32 GB of HBM2 memory. Nvidia claims that the increased 50 W of power is used in boosting the GPUs clock rate.

The DGX-2 has two different communication topologies. Each Tesla V100 GPU uses two mezzanine connectors to interface with the motherboard. One of the connectors carries PCIe traffic to the passive backplane on the front of the server and the other one carries NVLink traffic to the rear backplane. These backplanes communicate between the top and bottom system boards. Each of the system board contains eight Tesla V100 GPUs. The PCIe topology has four switches connecting the CPUs, RDMA-capable networking, and up to 30TB of NVMe SSDs to the GPUs.

This design demands a high-performance switching technology for the NVLink traffic and most of the ready-made solutions couldn’t match the company’s bandwidth and latency demands. Since there was no solution at all to tackle this situation, the company was forced to design its own new switch.

This led to the development of NVSwitch. It is produced on TSMC’s 12FFN process. It features 18 billion transistors and has 18 NVLinks and one PCIe link for device management. The NVLinks are capable of delivering 25GB/s of bandwidth per port, translating to 450 GB/s of aggregate bandwidth delivery. The eighteen switches are able to give up to 2.4TB/s of bi-sectional bandwidth between the GPUs.

The NVSwitch die has a size of 106mm2 with a unique elongated aspect ratio. Half of the die is used for I/O controllers instead of logic. The small port logic blocks conduct packet-transforms and make the entire system appear as a solitary GPU.

The NVSwitch is quite simple in comparison to other networking switches. This is mainly due to the fact that DGX-2 doesn’t need forward error correction. Nvidia has instead used standard CRC to enable internal consistency checks. It also has internal SRAM buffers but the external lanes have no buffering at all. DGX-2 has no repeaters or re-drivers for NVLink pathways.

The NVSwitches are arranged in a dual-crossbar arrangement. This means that the access from GPUs at the top board to the GPUs on the bottom board gives a slightly higher latency. AI models can allow these slight changes in latency since they work on extreme bandwidth. The NVSwitches are positioned at the rear of the chassis. The power draw of these NVSwitches hasn’t been revealed by Nvidia yet, but it claims that these NVSwitches consume less power than a standard networking switch.

The system uses 10kW, but the DGX-2 uses a 48 V power distribution subsystem. This subsystem reduces the amount of current needed to drive the system. The current from power supplies to the system boards is carried using copper bus bars.

This mammoth of a system is cooled using 1000 linear feet per minute of airflow but the preheated air coming from the GPU heatsinks leads to a reduced cooling effect on the rear of the system chassis. NVSwitches, being at the rear end of the chassis, require full-height heatsinks.

Nvidia also shared benchmark results showing the near-linear bandwidth scaling between remote GPUs on different system boards. This highlights the efficiency of the NVSwitches. Other benchmarks revealed were All-reduce and cuFFT, showcasing the merits of DGX-2’s topology over the last-gen DGX-1.

Nvidia answered most of the doubts people had regarding the NVSwitch technology at their Hot Chips presentation. They still didn’t give any info about the switch’s power draw. Nvidia has also ruled out a next-gen AI system based on Nvidia’s Turing architecture for the near future insisting that Volta is its current platform for AI systems. So a Turing-based DGX-3 is still quite far away from existence.

An avid football lover. A die-hard Barca fan. Love to read about all the tech and gadgets I can't afford (yet). A music enthusiast, with Coldplay being my favorite. A calm and peaceful person. I believe in taking everything as it comes and being happy with the simple things in life.

We cater to a traffic of more 3,00,000 visitors a month from all across the globe with the latest tech news and reviews. Although we cover a diverse range of topics, our primary focus lies in computer hardware and gaming