Nvidia has confirmed that its “Pascal” GPU architecture will launch in 2016 and that its “Volta” GPU architecture succeed it in 2017. Volta powered supercomputers are expected to be operational by the middle of 2017. It’s Nvidia’s sixth generation of General Purpose GPU architectures since the introduction of the company’s first unified shader graphics architecture code named Tesla. Which debuted with the company’s highly successful GeForce 8 – 8000 series – back in 2006.

Volta was originally intended to succeed Nvidia’s 900 series Maxwell GPU architecture in 2016. It was originally going to be the company’s first generation to feature stacked memory. However Volta was designed with HMC , the Hybrid Memory Cube, in mind. Unfortunately,HMC hadn’t matured as quickly as Nvidia had hoped. So a replacement was put in place that makes use of the other major stacked memory standard avilable in the market, High Bandwidth Memory or HBM for short. And thus Pascal was born.

GPU Architecture

NVIDIA Fermi

NVIDIA Kepler

NVIDIA Maxwell

NVIDIA Pascal

GPU Process

40nm

28nm

28nm

16nm (TSMC FinFET)

Flagship Chip

GF110

GK210

GM200

GP100

GPU Design

SM (Streaming Multiprocessor)

SMX (Streaming Multiprocessor)

SMM (Streaming Multiprocessor Maxwell)

SMP (Streaming Multiprocessor Pascal)

Maximum Transistors

3.00 Billion

7.08 Billion

8.00 Billion

15.3 Billion

Maximum Die Size

520mm2

561mm2

601mm2

610mm2

Stream Processors Per Compute Unit

32 SPs

192 SPs

128 SPs

64 SPs

Maximum CUDA Cores

512 CCs (16 CUs)

2880 CCs (15 CUs)

3072 CCs (24 CUs)

3840 CCs (60 CUs)

FP32 Compute

1.33 TFLOPs(Tesla)

5.10 TFLOPs (Tesla)

6.10 TFLOPs (Tesla)

~12 TFLOPs (Tesla)

FP64 Compute

0.66 TFLOPs (Tesla)

1.43 TFLOPs (Tesla)

0.20 TFLOPs (Tesla)

5.5 TFLOPs(Tesla)

Maximum VRAM

1.5 GB GDDR5

6 GB GDDR5

12 GB GDDR5

16 / 32 GB HBM2

Maximum Bandwidth

192 GB/s

336 GB/s

336 GB/s

1 TB/s

Maximum TDP

244W

250W

250W

300W

Launch Year

2010 (GTX 580)

2014 (GTX Titan Black)

2015 (GTX Titan X)

2016

Nvidia Volta GPUs To Feature GDDR6 Memory & HBM2

One of the major architectural overhauls that will be implemented in the Volta architecture is going to be to its memory sub-system. Gaming GeForce GTX 1100 series Volta graphics cards will feature sixth generation graphics DDR memory. GDDR6 will feature 14-16Gbps clock speeds, which is double that of GDDR5 and a good chunk ahead of GDDR5X.

Professional Volta SKUs under the Tesla brand will continue to use High Bandwidth Memory. Although notably they will be upgraded to the second generation of the technology. HBM2 uses less power and features double the clock speed of HBM1. Which translates to double the memory bandwidth at less power. Apart from an update SM design Volta will still be manufactured on TSMC’s 16nm FinFET process. Although, it will debut on a much more mature flavor of the process with even higher frequencies and better power efficiency. All of that is going to be of paramount importance in the final equation to delivering more performance and better efficiency in the gaming graphics market as well as the high performance computing and server markets.

Supports half precision FP16 compute at twice the rate of full precision FP32.

Nvidia Confirms Volta Coming In 2017

While admittedly HMC has shown much slower progress compared to HBM which is already being used in AMD’s latest GPU code named Fiji, HMC still offers some substantial benefits for the server and HPC markets. And that’s where Volta is set to shine.
Nvidia plans to introduce Volta in a range of consumer graphics cards by 2018 and to use Volta GPUs to power some really exciting and highly power efficient next generation supercomputers in 2017.

The Summit from Oak Ridge National Laboratory and Sierra from Lawrence Livermore National Laboratory supercomputers will ba major headliners in 2017. Both of these supercomputers have one thing in common, they will be powered by next generation IBM POWER9 CPUs and NVIDIA Volta GPUs.

Summit is rated at a peak single precision floating point performance of 150-300 PFLOPS. Which will be delivered by more than 3400 compute nodes. Each node powered by several next generation IBM POWER9 CPUs and NVIDIA Volta based Tesla accelerators. Each node will deliver around 40 teraflops of compute and is touted as a more performent solution than an entire rack of flagship Haswell based server chips.

There’s one technology that will be pivotal to delivering the promise of Volta GPGPUs in servers and supercomputers, and that’s NVLINK. This technology is aimed at GPU accelerated servers and supercomputers where the inter-chip communication is extremely bandwidth limited and a major system bottleneck. Nvidia states that NV-Link will be up to 5 to 12 times faster than traditional PCIE 3.0 making it a major step forward in platform atomics. Earlier this year Nvidia announced that IBM will be integrating this new interconnect into its upcoming PowerPC server CPUs.

Nvidia NVLink Technology

Advertisement

NVLink is an energy-efficient, high-bandwidth communications channel that uses up to three times less energy to move data on the node at speeds 5-12 times conventional PCIe Gen3 x16. First available in the NVIDIA Pascal GPU architecture, NVLink enables fast communication between the CPU and the GPU, or between multiple GPUs. Figure 3: NVLink is a key building block in the compute node of Summit and Sierra supercomputers.

NVLink is a key technology in Summit’s and Sierra’s server node architecture, enabling IBM POWER CPUs and NVIDIA GPUs to access each other’s memory fast and seamlessly. From a programmer’s perspective, NVLink erases the visible distinctions of data separately attached to the CPU and the GPU by “merging” the memory systems of the CPU and the GPU with a high-speed interconnect. Because both CPU and GPU have their own memory controllers, the underlying memory systems can be optimized differently (the GPU’s for bandwidth, the CPU’s for latency) while still presenting as a unified memory system to both processors. NVLink offers two distinct benefits for HPC customers. First, it delivers improved application performance, simply by virtue of greatly increased bandwidth between elements of the node. Second, NVLink with Unified Memory technology allows developers to write code much more seamlessly and still achieve high performance. via NVIDIA News

NVLink will debut with Nvidia’s Pascal in 2016 before it makes its way to Volta in 2018. And unlike Maxwell, Nvidia has laid major focus on compute and GPGPU acceleration with Pascal. The slew of features and new technologies that Nvidia will debut with Pascal emphasize this focus. Including the use of next generation stacked High Bandwidth Memory, high-speed NVLink GPU interconnect and support of mixed precision for the acceleration of mobile applications to push on mobile perf/watt. We expect that Volta will carry all of these forward.

Back to the Summit supercomputer, perhaps the most impressive thing about it is that it will consume 10% more power than the Titan supercomputer and in exchange deliver up to 10 times the computational performance. While Titan is rated at 25-30 PETAFLOPs, Sierra will deliver >100 PFlops of compute and Summit will deliver an even more impressive 150-300 PFlops of compute.