Federal Funds Plant Seeds for ‘Exascale’ Computers

Some researchers have a seemingly unquenchable thirst for computing speed. But a time-tested way of building faster systems is hitting fundamental barriers, prompting some new government grants aimed at tackling the problems.

The Department of Energy, under a program called FastForward, is disbursing cash awards to companies that include Intel, Advanced Micro Devices, Nvidia and IBM, which disclosed the grants this week.

Since the early 1990s, manufacturers have primarily built faster supercomputers from hundreds or thousands of chips that aren’t much different from those that provide calculating power to PCs. As those chips get faster, so have supercomputers, which continue to add more and more of them.

But it’s not practical to simply assemble an exascale system by buying more chips. The reason: such a machine would require more electricity than a sizeable city.

Estimates for how much power an exascale system would draw using today’s chip technology vary widely, from less than 100 megawatts to a gigawatt. But either would be way too much, researchers agree.

The DOE, which oversees Livermore, has set a goal for exascale systems that draw 20 megawatts. By comparison, the supercomputer at Lawrence Livermore National Laboratory that last month was ranked the world’s most powerful uses 8.6 megawatts, says Mark Seager, a former researcher at the lab who now is chief technology officer for Intel’s high-performance computing initiatives.

The ordinary practice of shrinking circuitry on chips–a process often called as Moore’s Law, after Intel’s co-founder–tends to reduce power consumption, but nearly enough.

To reach the 20-megawatt target “we have to do two times better than what Moore’s Law would get us,” Seager says. “That is a daunting challenge.”

So chip makers are trying new designs to meet the power goals. Intel, which is receiving about $16 million for microprocessor research under the DOE program, is working to extend variants of a new chip called the Xeon Phi that is expected to debut with more than 50 processors.

Intel is also receiving $3 million to help in developing memory technology, which also must achieve higher bandwidth at lower power consumption make exascale targets. One approach there, Seager says, involves stacking memory chips and linking them with shorter, faster connections than are found in most computers today.

AMD and Nvidia, meanwhile, are pursuing paths based partly on exploiting graphics technology that is best known for rendering lifelike images in videogames. Each has been awarded more than $12 million under the FastForward program.

AMD, whose Opteron microprocessors are used in some very large supercomputers, says part of its work under the DOE program will involve chips that combine such general-purpose circuitry with graphics chips–which AMD calls APUs, for accelerated processing units.

Nvidia has been particularly aggressive about pushing its graphics chips into scientific computing, including a new family dubbed Kepler that has special features for use in such applications. But the company’s chief scientist, Bill Dally, says the 150-watt power draw of Kepler would have to be cut by a fifth to even start achieving exascale goals.

After energy, the second-toughest challenge in planning exascale systems is what researchers call resiliency, Seager says. In many chips, energy emissions from other components in a system–or cosmic rays–can cause mathematical errors in a tiny proportion of a chip’s operations. But once there are millions of electronic brains in a system, the odds of errors grow to unacceptable levels.

“It’s a problem unique to building things on this scale,” agrees Dally of Nvidia.

While many recipients of FastForward awards are established companies, some smaller players also are getting in on the act. One is Whamcloud, a company in Danville, Calif., formed by supercomputer experts to commercialize open-source software called Lustre that has become a popular way to manage data storage on supercomputers. The startup has disclosed it is participating in a FastForward subcontract to develop a new “object storage paradigm” for exascale computing, working with companies that include supercomputer maker Cray and storage giant EMC.

The FastForward program, which offers two-year grants, is designed to spur development of components of exascale supercomputers–akin to onramps to a bridge, Seager says–not complete systems. That will be a more costly effort, which must still be funded by Congress, he adds.