Nvidia Unveils Next Generation Kepler GPU Compute Engine

Nvidia launched its latest line of Tesla GPU compute engines at the company’s Graphics Technology Conference in San Jose today. One model shipping immediately is based on the existing GK104 chip used in the recently released GTX 680. Dubbed the Tesla K10, the board delivers as much as 4.6 teraflops of single precision floating point performance, roughly three times the single precision FP of the older, Fermi-based Tesla. The card can also handle an aggregate memory bandwidth of 320GB per second. This board is targeted towards oil exploration, signal processing and seismic processing applications.

The more intriguing announcement is the Tesla K20. Built on a monster chip with 7.1 billion transistors, the K20 isn’t slated for release until Q4. Nvidia’s CEO, Jen-Hsun Huang noted that the K20 was the largest, most complex semiconductor chip ever built. It will likely use the same 28nm manufacturing process used for the GTX 680. The K20 is designed for computationally intensive HPC environments, particularly Finite Element Analysis (FEA), finance and physics applications. It offers three times the double-precision floating point performance of previous generation Tesla products. In addition to the huge transistor count, the K20 will sport a 384-bit memory interface.

New Features

In addition to improved compute performance, the K20 will support several key features to keep the chip busy when being fed compute chores. Hyper-Q increases the number of work queues from a single queue in the previous generation Fermi chip to 32 work queues. This improves GPU utilization, keeping more of the compute cores humming when running parallel compute applications.

Dynamic Parallelism behaves like a kind of parallel branch predictor. When fed tasks, the K20 can keep track of dependent tasks and spawn new compute kernels to complete those tasks, rather than having to request more work from the CPU.

Huang demonstrated a simulation of particles colliding, first starting with the last generation Fermi chip. That GPU could handle 20,000 bodies colliding in real time at high frame rates. Then he went on to demonstrate real-time modeling of the Andromeda and Milky Way galaxies colliding – not something we need to worry about for the time being, since it won’t happen for 3.8 billion years. That simulation ran on a Kepler-based Tesla, showing over 208,000 bodies colliding.

The GPU in the K20, code-named GK110, is expected to be used in the net Titan supercomputer being built at the Oak Ridge National Laboratory and the Blue Waters system at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign.