"No one else can do some of the things we offer," says Sverre Brandsberg-Dahl, head geophysicist at Petroleum Geo-Services (PGS), a global oilfield services and seismic exploration company. His Houston installation is the second-largest commercial entry on the list and #16 overall.

The list is called the Top500, and it lists high-performance computers (HPCs) in terms of raw speed. Jack Dongarra, now a professor at the Center for Information Technology Research at the University of Tennessee, explains that the list started by accident, when he wrote a benchmark (called Linpack) in 1979 based on the time required to solve matrix problems.

A friend started gathering the benchmark results and the first full list was published in 1993. It is not a full census of supercomputers, as the list contains only machines whose results have been submitted for inclusion.

"I have no idea on how many don't show up," Dongarra notes. "Obviously NSA [National Security Administration] machines are not counted -- and I'm told that the NSA has some big machines."

In 1993 the largest supercomputer was 60 gigaflops (billion floating point operations per second), and now it's 93 petaflops (quadrillion floating point operations per second), Dongarra notes. "Giga to tera to peta -- each step is three orders of magnitude, so we have gone up six orders of magnitude," he explains. "So the machines are a million times faster than in 1993, for about the same price."

Count the ways

Dongarra says the Top500 machines fall into three architecture styles: At around 70% of the total, those that use only commodity processors (i.e., mass-produced x86 CPUs); those (about 20%) that use commodity processors with accelerators such as Nvidia GPUs; and those (about 7%) that use "lightweight' or reduced instruction set processors that lack such enhancements as pre-fetch functions and large memory caches.

The commodity processors are x86 devices from either Intel or AMD. Even vendors like Cray that previously had special-purpose CPUs now use commodity processors, as they now have 64-bit width, can do fast floating point calculations and are mass-produced relatively cheaply, Dongarra says.

"There is no rule of thumb for what each architecture is best used for, and we see all three in the top ten," Dongarra says, adding that each of the top ten probably costs more than $100 million.

Beyond that, each node (i.e., the cores of one CPU) typically has a shared address space, and communicates with the other nodes and uses their address spaces via the Message Passing Interface (MPI) protocol. The nodes typically run Linux or Unix, sometimes in a special lightweight version. The machines are usually operated in batch mode and are kept running all the time, he adds.

The biggest draw 15 megawatts of power, not counting cooling, which could add another 20%. The cost of a megawatt, he notes, is about $1 million yearly.

Sources agree that HPCs outwardly resemble conventional server farms, with components in racks, but the density of the components is several times higher and the interconnects are more sophisticated.

Real-world uses

Of the latest 500 (from June 2016), 95 have performance of a petaflop or better, and about half of the 500 are used in industry, Dongarra notes.

"We broke away from the embarrassingly parallel regime of PC clusters and are no longer limited to what you can do with 256 gigabytes of RAM, since that was typically what people put in a single node," explains Brandsberg-Dahl. "We can now use the memory of any node and spread the problem over the whole machine -- 600 terabytes. Seismic surveys today can reach that size.

"The platform lets us implement new things quicker, and go from concept to running at scale a lot faster, with less computer engineering," he adds.

The other half of the most recent list is largely academic and research institutions, such as the National Energy Research Scientific Computing Center installation at the Lawrence Berkeley National Laboratory near Berkeley, Calif. Called Edison, it is #49 on the list, delivering 1.6 petabytes using 133,824 Xeon 2.4Ghz processors.

John Shalf, head of computer science there, says the machine serves about 600 users with about 7,000 different projects. He says he sees three types of users. First are the physicists who write their own software, often in Fortran or C++ with MPI. Second are complex programs written by large committees, such as the climate-modeling community. Third are those using turnkey third-party software, such as chemistry or Gaussian packages.

The use of MPI is particularly important as it lets apps treat the machine as one huge memory space, Shalf notes. "Any processor can read and write anywhere in the machine -- it's useful for data mining and irregular applications," he says. Since the machine is not used for classified work, users can have remote access, he adds, and treat it like a cloud computer. The lab is currently building a second machine, called Cory, that should offer tens of petaflops and attain the #2 or #3 spot on the list, Shalf adds.

Customers "used to say they wanted a hundred teraflops, but now it's common to hear half a petaflop, with the bigger data centers having 10 to 15 petaflops," Mannel says.

Sumit Gupta, IBM's vice president of HPC and data analytics, says supercomputers are often the unseen market advantage of many corporations. "Before they put crash dummies in real cars they have run thousands of crash simulations on a computer model," says Gupta. "Cell phone makers run simulations to find the best place to put antenna, and simulate the thermals of the phone so it doesn't get too hot."

Gupta estimates that a car crash simulation might require a hundred servers, defining a server as two CPUs with ten cores each, plus GPU accelerators. Drug interaction studies or atomic simulations might take thousands, sometimes tens of thousands of servers, he adds. The servers rely on Linux not because it's free, but because it offers good performance, he notes.

As for software complexity, Mannel says four different software stacks are common: One for developers, with compilers, debuggers, and parallel code libraries; the user stack, often in the form of a portal; a load leveler with schedulers and balancers; and the administrator view for cluster managing and service provisioning.

As for the price, "For peak performance of a petaflop, the rule of thumb is that it will cost $5 million," says Mannel.

For those who want something closer to a turnkey system, Barry Bolding, senior vice president at Cray, points to the Cray Urika GX, which can have 1,700 cores in a rack, draw 30 to 40 kilowatts, and cost at least $200,000. Such a unit might have more than 20 terabytes of DRAM, 125 terabytes of SSD memory, and 190 terabytes of disk space, he adds. It comes with its own graphics software, Hadoop, and other packages.

The importance of HPC was recognized by the White House last year, with the launch of the National Strategic Computing Initiative. Described as a collaborative effort between government, industry and academia, its announced aims are to promote the creation of exaflop systems that can handle exabyte data; keep the U.S. at the forefront of the HPC field; promote the development of the necessary hardware; improve HPC programmer productivity; and promote access to HPCs.

China syndrome

"In 2001 they had zero on the list," notes Dongarra. "Now they have 167, while the U.S. has 165, and third is Japan with 29. That is the lowest point for the U.S. since 1993 [when the list started]. China has made a concentrated effort to invest in HPC. They see HPC has providing some sort of strategic advantage. I would say that all the [Chinese] ones on the list are production machines, not stunt machines."

"China is moving more aggressively than any other entity," notes industry analyst Chirag Dekate at Gartner. "They believe that to dominate the economies of tomorrow they need to control the supply chains of emerging technologies."

On February 12, 2015, the U.S. Department of Commerce embargoed exports to four Chinese organizations: The National University of Defense Technology (which builds computers), and the National Supercomputing Centers in Changsha, Hunan (#125 on the list), Guangzhou (#2) and Tianjin (#32), because of their involvement in activities "contrary to the national security or foreign policy interests of the United States." Specifically, the supercomputers are "believed to be used in nuclear explosive activities."

"The embargo was on places that already had or were about to upgrade machines with Intel parts," explains Dongarra.