Cray beats six to win Argonne deal

Supercomputer maker Cray has high hopes for its XE6 parallel supercomputers, not just in terms of number-crunching performance for its customers, but also financial performance for itself. While a deal announced to put a fairly large XE6 machine into Argonne National Laboratory will not help Cray make its 2010 financial goals, it starts the foundations for 2011.

Argonne is a big user of IBM's BlueGene Power-Linux parallel machines, so this is a big win for Cray. The lab, which is owned by the US Department of Energy and administered by the University of Chicago, has nicknamed its new super "Beagle," after the ship that took Charles Darwin around the world in 1831; that trip gave Darwin the raw data that resulted in his theories of evolution. The Beagle super will have 18,000 Opteron cores and will have an aggregate peak theoretical performance of 150 teraflops. Beagle will reside at the Computation Institute, a joint venture between the University of Chicago and Argonne and will be housed in a new facility called the Theory and Computing Sciences lab; the money to pay for Beagle came from the National Institutes of Health's National Center for Research Resources.

While Beagle will only rank in the top fifty or so supers in the world when it is up and running next year on February 12 - that's Darwin's birthday, and if he were alive then he would be 202 years old and no doubt surprised - it will be, according to Argonne and the university, one of the largest supers in the world dedicated solely to life sciences number-crunching. (That says a lot about the people of planet Earth, considering how many petaflops are dedicated to nuclear weapons design or management or nuclear physics.)

Beagle will be making use of the "Gemini" interconnect that Cray launched in May, which can link eight-socket blade servers in the XT6 and XE6 supers with two Gemini interconnect chips into a 3D torus topology that can scale to more than 1 million processor cores. The XT6 and XE6 blade servers have two four-socket blades and are rated to use the six-core "Istanbul" Opteron 8400 or twelve-core " Opteron 6100 processors.

Ian Foster, director of the Computation Institute, said that the XE6 system was chosen after putting the products of five different (and unnamed) HPC vendors products plus two hybrid systems through the bidding process. Cray was hinting that the Beagle machine could be scaled up to sustained petaflops-class performance, the University of Chicago and Argonne have not made any commitment to do so. Based on the prevailing price of the a petaflops of XE6 computing power (gleaned from the few deals where pricing information has been available), the Beagle machine should have cost around $6.8m.

More than 100 life sciences researchers from UChicago as well as from Cornell University, the University of Maryland, and a dozen other NIH-funded biomedical research teams will be sharing Beagle. The Computation Institute manages the TeraGrid network of supercomputers that is funding by the US National Science Foundation and that as of last year had over 2 petaflops oomph and 60 petabytes of storage across its eleven facilities.

Not all of the machines at Argonne are in TeraGrid, of course, and Cray is going to have to sell a lot more iron to Argonne to bypass the "Mira" super that the lab is having Big Blue build for it. Argonne's Leadership Computing Facility lab currently has a BlueGene/P super that was installed in late 2007 that is rated at 557.1 peak (458.6 sustained) performance on the Linpack benchmark test. The BlueGene/P super uses modestly oomphish 850 MHz PowerPC 450 cores, and to hit that performance level takes 163,840 cores. While this Argonne machine, called "Intrepid," is fast, the lab is looking for a factor of 20 improvement in performance with the Mira BlueGene/Q super slated to be installed in 2012. The Mira super is being funded by the DOE's Office of Science and researchers from all over the country will be able to bid on time on this 10 petaflops monster.

IBM has been pretty vague about the design of the BlueGene/Q super, but did announce in February 2009 that Lawrence Livermore National Laboratory, one of Argonne's DOE peers, was buying a BlueGene/Q machine that would pack 1.6 million cores, 1.6 PM of main memory, and 20.13 petaflops of aggregate peak performance. The interesting thing about this future BlueGeneQ design is not is raw number-crunching power, but the increased efficiency.

The BlueGene/P super that Argonne is currently using can deliver around 365 megaflops per watt, which is a lot better than the original BlueGene/L machines, which were rated at 205 megaflops per watt. The Sequoia machine at LLNL - and hence the future Mira box at Argonne, too - can deliver 3,050 megaflops per watt. IBM has not said how this will be accomplished, but it is a fair guess that IBM is going to do a massive shrink of the PowerPC cores and put lots of them on a processor card while resisting the temptation to boost clock speed.

The Mira machine will have over 750,000 processor cores, according to Argonne, which already has sixteen different research projects across a variety of disciplines, including the design of electric car batteries, nuclear reactor design, boosting the efficiency of combustion engines, and cosmology. ®