Nvidia Tesla V100 throws 21 billion transistors at GPU computing

Machine learning is one of the most demanding applications for GPUs today, and Nvidia has been riding that wave with huge graphics chips dedicated to compute tasks. The Tesla P100 was the crown jewel of the Pascal architecture for general-purpose GPU computing, and today, Nvidia took the wraps off its first Volta GPU to continue that mission. Say hello to the GV100 GPU aboard the Tesla V100 accelerator.

Every spec of the V100 is eye-popping. Nvidia CEO Jensen Huang says the 815mm² chip is made at the reticle limit of TSMC's 12-nm FFN process. Its 21 billion transistors make up 5120 stream processors running at a boost clock of 1455 MHz, good for 7.5 TFLOPS of FP64 operations or 15 FP32 TFLOPS. Nvidia also provisioned this chip with 20MB of register files for its shader multiprocessor (SM) units, 16MB of cache, and 16GB of HBM2 memory good for 900 GB/s of theoretical memory bandwidth. The chip talks to other components in a system with a 300 GB/s second-generation NVLink interconnect.

The Tesla V100 also includes dedicated hardware for the tensor operations critical to deep learning tasks. Nvidia claims the chip can process tensor tasks at 120 TFLOPS (perhaps a typo for TOPS, the metric usually cited for tensor operations). For comparison, Google's TPU ASIC can deliver a claimed 92 TOPS.

The full Volta GPU comprises 84 SMs, 5376 FP32 SPs, 5376 INT32 SPs, 2688 FP64 SPs, 672 "tensor cores," and 336 texture units. Almost certainly because of the chip's enormous size and the associated yield challenges, the Tesla V100 doesn't have all of these resources enabled.