NVLink and NVSwitch

A Need for Faster, More Scalable Interconnects

More developers across industries are relying on parallel computing for applications like AI, driving a need for multi-GPU systems. But while multi-processor configurations with PCIe are standard for solving large, complex problems, PCIe bandwidth often creates a bottleneck. What’s needed is a faster, more scalable multiprocessor interconnect.

How NVLink and NVSwitch Work Together

Introducing NVIDIA? NVLink? and NVIDIA NVSwitch?. NVLink is a high-speed, direct GPU-to-GPU interconnect. NVSwitch takes interconnectivity to the next level by incorporating multiple NVLinks to provide all-to-all GPU communication within a single node like NVIDIA HGX-2?. The combination of NVLink and NVSwitch enabled NVIDIA to win MLPerf, AI’s first industry-wide benchmark.

NVLink

Tesla V100 with NVLink GPU-to-GPUconnections

NVSwitch

All-to-all communication between 16 GPUs

NVLink

NVLink Maximizes System Throughput

NVIDIA NVLink technology addresses interconnect issues by providing higher bandwidth, more links, and improved scalability for multi-GPU system configurations. A single NVIDIA Tesla? V100 GPU supports up to six NVLink connections for a total bandwidth of 300 gigabytes per second (GB/sec)—10X the bandwidth of PCIe Gen 3. Servers like the NVIDIA DGX-1? and DGX-2 take advantage of this technology to give you greater scalability for ultrafast deep learning training.

Highest Levels of GPU-to-GPU Acceleration

First introduced with the NVIDIA Pascal? architecture, NVLink on Tesla V100 has increased the signaling rate from 20 to 25 GB/s in each direction. This direct communication link between two GPUs, improves accuracy and convergence of high-performance computing (HPC) and AI and achieves speeds over an order of magnitude faster than PCIe.

New Levels of Performance

NVLink can bring up to 70 percent more performance to an otherwise identically configured server. Its dramatically higher bandwidth and reduced latency enables even larger deep learning workloads to scale in performance as they grow.

NVSwitch

NVSwitch: The Fully Connected NVLink

The rapid adoption of deep learning has driven the need for a faster, more scalable interconnect, as PCIe bandwidth often creates a bottleneck at the multi-GPU system level.

NVIDIA NVSwitch builds on the advanced communication capability of NVLink to solve this problem. It takes deep learning performance to the next level with a GPU fabric that enables more GPUs in a single server and full-bandwidth connectivity between them.

Full Connection for Unparalleled Performance

NVSwitch is the first on-node switch architecture to support 16 fully connected GPUs in a single server node and drive simultaneous communication between all eight GPU pairs at an incredible 300 GB/s. These 16 GPUs can be used as a single large-scale accelerator with 0.5 terabyte of unified memory space and 2 petaFLOPS of deep learning compute power. A single HGX-2 or DGX-2 system with NVSwitch delivers up to 2.7X more application performance than 2 HGX-1 or DGX-1 systems connected with InfiniBand.

ECWMF’s IFS: The Integrated Forecasting System (IFS) is a global numerical weather prediction model developed by the European Centre for Medium-Range Weather Forecasts (ECMWF) based in Reading, United Kingdom. ECMWF is an independent intergovernmental organization supported by most of the nations of Europe, and operates one of the largest supercomputer centers in Europe for frequent updates of global weather forecasts. The IFS mini-app benchmark focuses its work on a spherical harmonics transformation that represents a significant computational load of the full model. The benchmark speedups shown here are better than those for the full IFS model, since the benchmark amplifies the transform stages of the algorithm (by design). However, this benchmark demonstrates that ECMWF’s extremely effective and proven methods for providing world-leading predictions remain valid on NVSwitch-equipped servers such as NVIDIA’s DGX-2, since they are such a good match to the problem.

Mixture of Experts (MoE): Based on a network published by Google at the Tensor2 Tensor github, using the Transformer model with MoE layers. The MoE layers each consist of 128 experts, each of which is a smaller feed-forward deep neural network (DNN). Each expert specializes in a different domain of knowledge, and the experts are distributed to different GPUs, creating significant all-to-all traffic due to communications between the Transformer network layers and the MoE layers. The training dataset used is the “1 billion word benchmark for language modeling” according to Google. Training operations use Volta Tensor Core and runs for 45,000 steps to reach perplexity equal to 34. This workload uses a batch size of 8,192 per GPU.

Get the Latest from NVIDIA
on Data Center

LIMITED TIME OFFER: $49,900 ON NVIDIA DGX STATION

For a limited time only, purchase a DGX Station for $49,900 - over a 25% discount - on your first DGX Station purchase.* Additional Station purchases will be at full price.
Reselling partners, and not NVIDIA, are solely responsible for the price provided to the End Customer. Please contact your reseller to obtain final pricing and offer details.
Discounted price available for limited time, ending April 29, 2018. May not be combined with other promotions. NVIDIA may discontinue promotion at any time and without advance notice.