US funds exascale computing journey

Thanks to $7.4m in government funding a pair of national labs hope to throw their big brains at the most pressing problems facing supercomputer designers.

Sandia and Oak Ridge national laboratories this week touted their new Institute for Advanced Architectures (IAA), which will explore what it takes to create "Exascale" machines. The researchers will tackle issues such as power, many-core processors, multi-threaded code and communications between the components in the largest of supercomputers. Breakthroughs in any or all of these areas should benefit the National Nuclear Security Administration and the Department of Energy’s Office of Science, which will support the work.

Just this week, the Texas Advanced Computer Center (TACC) did some ribbon cutting on a supercomputer said to mark a "new era for petascale science." That claim to fame comes from the "Ranger" machine's ability to hit 504 teraflops - or half a petaflop - of peak performance. Exascale computers would take things once again to the next level with an exaflop coming in 1,000 times faster than a petalop. So, exaflop computers could crank through a million trillion calculations per second.

The IAA will concentrate on closing the gap between peak and sustained performance for exascale supercomputers. Part of that mission will revolve around making sure that all processors in a supercomputer stay active working on problems. And that's a particularly hairy issue when you consider that today's top supercomputers run on tens of thousands and even hundreds of thousands of cores - figures that will increase in coming years due to the rise of multi-core processors.

"In an exascale computer, data might be tens of thousands of processors away from the processor that wants it,” says Sandia computer architect Doug Doerfler. "But until that processor gets its data, it has nothing useful to do. One key to scalability is to make sure all processors have something to work on at all times."

"In order to continue to make progress in running scientific applications at these [very large] scales,” says Jeff Nichols, who heads the Oak Ridge branch of the institute, “we need to address our ability to maintain the balance between the hardware and the software. There are huge software and programming challenges and our goal is to do the critical R&D to close some of the gaps.”

The labs will also tackle the nagging issue of power consumption for large machines. Similar work is also taking place at IBM, Lawrence Berkeley National Lab and a variety of other research institutes.

Famed Berkeley researcher Dave Patterson - he of RISC and RAID fame - is also spearheading research into novel programming techniques that could benefit supercomputer class machines as well as more standard boxes running on multi-core chips. Patterson's Parallel Computing lab recently took in $10m from Microsoft and Intel.

Berkeley's win caught the eye of Linux kernel writer Linus Torvalds who started complaining about the parallel computing research on a message board. Patterson fought back, although mustering any rebuttal seemed a rather hopeless task since Torvalds failed to grasp the concepts of research and effort. If you want to hear more about Patterson's vision of the future, we have the show for you. ®