Uncle Sam buys 20 petaflops BlueGene super

Packed with 1.6 million Power cores

The American government's appetite for ever-bigger gobs of supercomputing power has been a blessing for IBM and Cray, and this morning it was Big Blue's turn to brag about two big deals it has scored with Uncle Sam.

The first and smallest of the two machines, nicknamed Dawn, will be installed in the second half of this year at Lawrence Livermore National Laboratory, one of the Department of Energy's supercomputing centers. LLNL and Sandia National Laboratory do a lot of research and drive supercomputing technology, but their systems are predominantly dedicated to simulating nuclear weapons and their detonation - what IBM referred to as "the nation's aging nuclear deterrent".

Dawn is based on technology that is being commercialized as the BlueGene/P, a kicker to the BlueGene/L system that was tested first at LLNL and then commercialized by IBM. The BlueGene/L machine was based on 32-bit PowerPC embedded processors, with two of them on a processor card. BlueGene/P machines plunk four 850 MHz PowerPC 450 processors onto a single processor card, linked by symmetric multiprocessing, so they can share main memory, which in this case is 2 GB of DDR2 main memory per node. BlueGene machines have never been based on IBM's commercial 64-bit Power processors - Power4, Power5, and Power6 - and part of the reason is heat.

While these chips have offered a lot of integer and floating point performance, the BlueGene designs have always made it up in volume but putting zillions of chips in the box and keeping the overall thermals of the system low. The Dawn system will have 150,000 cores and the 36 racks that house these processing elements will eat up a mere 1,311 square feet of data center floor space.

The Dawn machine will be rated at 501 teraflops of number crunching power, will burn 1.13 megawatts of juice, and will deliver 443 megaflops per watt. The current BlueGene/L machine at LLNL, which is ranked at number four on the Top 500 supercomputers list, has 212,992 PowerPC 440 processor running at 700 MHz; the resulting machine is rated at 478 teraflops.

So the Dawn system is obviously not giving LLNL more oomph. But the earlier BlueGene/L system consumes 2.33 megawatts of electricity, yielding a paltry 205 megaflops per watt in terms of power efficiency. The handful of BlueGene/P machines that IBM has sold to research labs to date have been rated at between 357 and 372 megaflops per watt, and it is unclear how the BlueGene/P is getting more crunch per watt.

The big bad box that IBM sold to LLNL and is talking about today is nicknamed Sequoia, and it will be based on a "future Power technology" that IBM is not being specific about at this point. IBM is expected to deliver Sequoia in 2011, and it will pack 1.6 million processor cores, a more than tenfold increase over the Dawn machine that will deliver a fortyfold increase in performance to 20.13 petaflops.

The Sequoia machine, which will have 1.6 PB of main memory (the IBM press release said 1.6 TB originally, which was perplexing), will be housed in a mere 96 racks and take up 3,422 square feet of floor space - a little more than twice the size of the Dawn system. The Sequoia massively parallel machine will consume approximately 6.6 megawatts of juice, which is a lot for a single machine, but given that it will deliver 3,050 megaflops per watt - a nearly seven-fold increase in power efficiency over the Dawn system - the tradeoff seems to be worth it. Provided you have a data center that can handle 6.6 megawatts.

Luckily, the Department of Energy has big budgets to build big data centers.

That big jump in efficiency will set tongues to wagging, that perhaps IBM is going to shift from 32-bit PowerPC embedded processors to commercial 64-bit Power processors with the Sequoia machine. IBM sources say to not jump to that conclusion, but that doesn't mean it isn't a possibility. It is also possible that this future BlueGene box will be based on a kicker to the Cell PowerPC chip, which is a 64-bit Power core wrapped with eight vector math units. The Cell chips provide most of the oomph in the "Roadrunner" Opteron-Cell hybrid running in Los Alamos National Laboratory (also a DOE box), and IBM could be using a hybrid PowerPC-Cell architecture.

It is far more likely, however, that IBM is going to do a massive shrink on PowerPC embedded processors, cram lots of them on system boards, and make it possible for LLNL to run its existing code unchanged on Sequoia.

All BlueGene machines run Linux, but in theory they could also support AIX if IBM and its customers were so inclined. So far, no one wants AIX on this massively parallel supers, and those who do want AIX (like some of the DOE labs) tend to use Power 575 nodes. The Dawn and Sequoia machines will be built in IBM's Rochester, Minnesota, factory, where it also makes commercial Power Systems servers. Pricing for the Dawn and Sequoia contracts was not announced, as usual. ®