The fastest computers are going hybrid

During the past decade, the biannual list of the world's fastest supercomputers has become increasingly dominated by systems that use a mix of processors, including commodity processors produced by Intel and Advanced Micro Devices

Automobiles aren’t the only machines taking a hybrid
approach. Judging by the recent SC08 conference in Austin, Texas,
the future of supercomputer design seems to be heading toward using
multiple types of processors in a single system. That approach is a
significant change in the supercomputing field, and like any major
shift in technology, it comes with hidden problems.

In the past decade, systems that use commodity processors
produced by Intel and Advanced Micro Devices have increasingly
dominated the biannual Top500 list of the world’s fastest
supercomputers compiled by laboratories at the Energy Department
and a group of universities.

Although not as powerful as vector processors built specifically
for the high-performance computer market, those chips are much less
expensive and offer more processing power per dollar when bought in
bulk.

Recently, however, developers began augmenting commodity
processor-based supercomputers with specialty processors, such as
floatingpoint accelerators, field-programmable gate arrays,
repurposed graphics processing units (GPUs) and even IBM’s
Cell Broadband Engine (Cell/BE) processors, which were designed for
video game consoles.

For example, developers of the top computer on the most recent
Top500 list — Los Alamos National Laboratory’s
Roadrunner, a 1.1 petaflop IBM machine — augmented its AMD
Opterons with IBM PowerXCell processors. And on the Green500 list,
which is the Top500 reordered by power efficiency, the top seven
computers all ran on IBM Cell/BE-based BladeCenter QS22
servers.

Why the shift? Better power usage.

“Power performance has become a very important metric as
of late — some feel even more important than [simply]
performance,” said Kaushik Datta, a graduate student in
computer science at the University of California, Berkeley. Datta
presented the results of a study he led about the best ways to
design multicore systems at the SC08 conference.

Although the Top500 list ranks machines by how many
floating-point operations/sec (flops) a machine executes, the
Green500 ranks them by how many flops per watt a machine executes.
In that realm, specialized processors rule. One industry expert at
the conference estimated that the Cell/BE can produce about 14
flops for about 97 watts of energy, and a GPU can produce about 2
flops per watt. Meanwhile, a generic x86 processor can produce only
about 1 flops at that wattage.

“As you specialize the chip, you’re able to be much
more efficient with what you are doing with the flops,”
Timothy Mattson, a senior research scientist at Intel, said during
a talk on the company’s experimental 80-core Tera-scale
processor.

Of course, new architectures require developers to rework their
code. We hear that the Cell/BE, which is still in its infancy, has
an especially steep learning curve for programmers.

“Are you willing to put in the time to program” for
these environments? Datta asked rhetorically. That is the question
system builders and developers will have to ask themselves while
hungrily eyeing performance gains.

About the Author

Joab Jackson is the senior technology editor for Government Computer News.