Cray vs. China: The race to 100 Petaflops

Hot on the heels of the 20 Petaflops Titan supercomputer at Oak Ridge National Laboratory, the recently crowned world’s fastest supercomputer, Cray has announced the XC30 architecture, which will allow for the creation of supercomputers that break 100 Petaflops – 100 quadrillion floating-point operations per second.

Not to be outdone, China has also announced its own 100 Petaflops contender, Tianhe-2. Tianhe-2, which is expected to be deployed by 2015, is the successor to Tianhe-1A – a supercomputer that briefly held the title of the world’s fastest back in 2010 (a first for China).

Cray XC30

Cray’s new supercomputer architecture, as you can imagine, is beyond beastly. For starters, the XC30 [PDF] marks an interesting shift away from AMD Opteron CPUs to the Intel Xeon – the E5-2600 (Sandy Bridge) family of chips to be exact. With Cray being one of the world’s biggest supercomputer companies, this is a significant loss for the already-ailing AMD.

Now for the numbers. Each XC30 blade (server rack) will contain four compute nodes, each of which contains two Xeon CPUs. There are 16 blades in an XC30 chassis, and three chassis per cabinet, totalling 384 CPUs per cabinet.

Each cabinet will initially be capable of around 66 Teraflops – and Cray says the XC30 architecture is capable of scaling to 100 Petaflops, which is roughly 1,500 cabinets. At 1,500 cabinets, we’re talking a sum total of 575,000 CPUs and over 4.5 million individual cores (9 million, including Hyper-Threading). The volume of memory is equally monstrous, too: Each node can have up to 128GB of RAM, which equates to 24 Terabytes of RAM per cabinet – or somewhere in the region of 35 Petabytes of RAM for a 1,500 cabinet system.

Tianhe-2

At this point in time, we know very little about Tianhe-2 other than the Chinese are aiming for deployment of a 100 Petaflops system by 2015 – and then an Exaflops (1,000 petaflops) system by 2018. China would like to lessen its reliance on US tech by building Tianhe-2 out of homegrown CPUs, such as the ShenWei SW-3 1600, but in reality its tech isn’t quite there yet. Tianhe-2 will probably use Xeon or Opteron processors, perhaps in concert with Nvidia Tesla GPUs – but maybe Tianhe-3 will use homegrown CPUs.

Cray says that the first XC30 systems are being deployed right now, and that they’ll be widely available in the first quarter of 2013. Cray lists a bunch of clients that will be installing XC30-based supercomputers, but as yet it isn’t clear who – if anyone – will actually be building a 100-petaflops system.

In all likelihood, it will probably be one of the US national laboratories. With ORNL’s Titan already at the top of the table, and LLNL and Argonne using IBM BlueGene computers, the DOE’s National Energy Research Scientific Computing Centre (NERSC) in Berkeley, California is probably the prime contender. Given that the XC30 is already shipping, and Tianhe-2 is still in the planning stages, it would seem safe to assume that Cray will beat China to 100 Petaflops.

Finally, one last tidbit: For now, the Cray XC series only supports Intel Xeon CPUs – but the company says that future flavours will support both Xeon Phi and Kepler-based Nvidia Tesla coprocessors. Just a couple of days ago, Intel released the first 60-core Xeon Phi, and like the Tesla K20 it churns out the same kind of computational power as an entire 8-CPU XC30 blade. 100 Petaflops – 100,000,000,000,000,000 calculations per second – is beginning to sound like child’s play for these next-generation supercomputers.