Why China's New Supercomputer Is Only Technically the World's Fastest

Peak performance doesn’t equal sustained performance, and the NVIDIA GPUs in the Tianhe 1A are especially bad at the latter.

What’s in a number? If you’re the engineers behind China’s Tianhe 1A, the number 4.7 means a lot – it’s the number of petaflops (as in floating point operations per second) that the “world’s fastest” supercomputer can chew through at its peak performance.

An NVIDIA Tesla cluster of the kind used in the new Chinese supercomputercc ChrisDag

The key word here is “peak performance”: while the Linpack benchmark used to officially determine the speed of the world’s fastest supercomputers measures their ability to do calculations in short bursts, in the real world of scientific computing, what often matters most is a machine’s ability to sustain that performance.

In other words, the Tianhe 1A comes on strong, but American supercomputers can last all night - or sometimes many days, depending on the scale of the problem they’re tackling.

“It’s very difficult to achieve anywhere near peak performance on GPUs,” says Thom Dunning, director of the National Center for Supercomputing Applications. GPUs are the NVIDIA-built graphics processing units that comprise the bulk of the computing power in the Tianhe 1A supercomputer, which also includes traditional CPUs in its hybrid design.

“The system uses 7,168 NVIDIA Tesla M2050 GPUs and 14,336 CPUs; it would require more than 50,000 CPUs and twice as much floor space to deliver the same performance using CPUs alone,” says the press release coinciding with the announcement.

The problem with GPUs, says Dunning, is that they are so “compute hungry” that they “tend to sit idle for a large percentage of the time.” The bottleneck is the memory on board GPU processors: it’s fast, but not fast enough.

“There’s a significant mismatch between memory speed and GPU speed,” he adds.

Even if China’s supercomputing software engineers are able to create useful scientific software that get close to the machine’s peak performance by rarely accessing memory, it’s not clear that the Linpack benchmark which pegs the machine as the world’s fastest is a useful indicator of its performance in real-world applications.

“The Linpack benchmark is one of those interesting phenomena – almost anyone who knows about it will deride its utility,” says Dunning. “They understand its limitations but it has mindshare because it’s the one number we’ve all bought into over the years.”

The system’s reliance on GPUs also means that overwhelming majority of existing supercomputing software would have to be entirely re-written to run on it. That’s a programming challenge that has so far eluded engineers in the west - it’s “more art than science,” at this point, says Dunning. That doesn’t mean it’s impossible, or that the Chinese won’t soon have a fleet of supercomputers ranking in the top 500 world ranking. Their real performance and utility, however, has yet to be determined.