NVIDIA Volta GPUs and IBM Power9 CPUs To Deliver Up To 300 PetaFlops of Performance in 2017 With Summit and Sierra Supercomputers

On 29th October 2012, Oak Ridge National Laboratory made functional their Titan supercomputer which housed 18,688 Tesla K20X GPUs and 18,688 AMD Opteron 6274 CPUs, built just for scientific research. So amazing was its compute capability that NVIDIA’s next big graphics card was named after it. Two years have passed since the supercompter became functional and work is underway on building the next big thing after Titan.

NVIDIA Volta GPUs To Power Next Big Supercomputers in 2017

In 2017, we will be looking forward to two new supercomputers, the Summit from Oak Ridge National Laboratory and Sierra from Lawrence Livermore National Laboratory. Now both of these supercomputers have one thing in common, both of them will feature several next generation IBM POWER9 CPUs and also several NVIDIA Volta GPUs. But before we go into details, there’s one interesting thing I like to point out. We all know Maxwell is here to stay for at least first half of 2016 before NVIDIA transitions to their next architecture code named Pascal in second half of 2016. Pascal is expected to stay its term for two years till 2018 when NVIDIA will replace it with their next generation Volta architecture. However, according to the data from NVIDIA, Volta will make its way to the professional market one year early in 2017 which means NVIDIA is on track with their GPU roadmap that was displayed at GTC 2014.

That’s good news for all and next up, we will detail some of the performance and specifications of the Summit supercomputer which is a computing beast eclipsing everyone behind. Rated at a peak performance of 150-300 PFLOPS, the Summit supercomputer will be based on more than 3400 compute nodes with each node consistin of several next generation IBM POWER9 CPUs and NVIDIA Volta architecture based Tesla GPUs. Each node will deliver around 40 TFLOPs of compute performance and is said to be enough to outperform an entire rack of top of the line Haswell based x86 CPU servers.

With Summit come a few other enabling technologies such as NVLINK which is expected to debut around 2016. The Pascal GPU would be the first to introduce NVLINK which is the next generation Unified Virtual Memory link with Gen 2.0 Cache coherency features and 5 – 12 times the bandwidth of a regular PCIe connection. This will solve many of the bandwidth issues that high performance GPUs currently face.

First technology we’ll announce today is an important invention called NVLink. It’s a chip-to-chip communication channel. The programming model is PCI Express but enables unified memory and moves 5-12 times faster than PCIe. “This is a big leap in solving this bottleneck,” Jen-Hsun says. NVIDIA

On summit, each compute node will feature 512 GB of memory through dense server DDR4 memory and high bandwidth HBM stacked designs which will work in complete coherency. Extending the impressive amount of memory is an additional 800 GB of NVRAM per node, which can be configured either as a burst buffer or as extended memory. The system is interconnected with dual-rail Mellanox EDR InfiniBand, using the full, non-blocking fat-tree design. Summit will also include the GPFS parallel file system with 1 TB/s of I/O bandwidth and 120 PB of disk capacity.

NVLink is an energy-efficient, high-bandwidth communications channel that uses up to three times less energy to move data on the node at speeds 5-12 times conventional PCIe Gen3 x16. First available in the NVIDIA Pascal GPU architecture, NVLink enables fast communication between the CPU and the GPU, or between multiple GPUs. Figure 3: NVLink is a key building block in the compute node of Summit and Sierra supercomputers.

NVLink is a key technology in Summit’s and Sierra’s server node architecture, enabling IBM POWER CPUs and NVIDIA GPUs to access each other’s memory fast and seamlessly. From a programmer’s perspective, NVLink erases the visible distinctions of data separately attached to the CPU and the GPU by “merging” the memory systems of the CPU and the GPU with a high-speed interconnect. Because both CPU and GPU have their own memory controllers, the underlying memory systems can be optimized differently (the GPU’s for bandwidth, the CPU’s for latency) while still presenting as a unified memory system to both processors. NVLink offers two distinct benefits for HPC customers. First, it delivers improved application performance, simply by virtue of greatly increased bandwidth between elements of the node. Second, NVLink with Unified Memory technology allows developers to write code much more seamlessly and still achieve high performance. via NVIDIA News

The most impressive feature about Summit will be that while it will consume 10% more power than the Titan supercomputer, it will also deliver an impeccable 5 times more system to application power than the previous fastest supercomputer. While Titan was rated at 25-30 PETAFLOPs, Sierra will be >100 PFlops while Summit will be 150-300 PFlops at any given circumstance.