Japan takes Sparc to 8 Petaflops

SAN JOSE, Calif. – A Japanese supercomputer has been clocked at more than eight petaflops on the Linpack scale making it by far the fastest system in the world, according to the latest version of the Top 500 list. The computer is powered by 68,544 Sparc-based processors from Fujitsu, specially tuned to the needs of giant clusters.

The so-called K Computer at the Riken Advanced Institute for Computational Science in Kobe is a high water mark for Japan, Fujitsu and Oracle, owner of the Sparc architecture. It marks the first time Japan has ranked number one in supercomputers since November 2004. It also marks the first time a Fujitsu processor and Sparc-based architecture has ever powered the world's fastest computer.

The K system leapfrogged China's Tianhe-1A in Tianjin which took the number one slot in November at 2.6 petaflops. Indeed, Japan's new system is more powerful than the next five computers combined on the latest list.

The K Computer is also the fourth most energy-efficient systems on the list at 825 Mflops/watt. It also consumes the most raw power—a whopping 9.89 megawatts—and is one of 29 Top 500 systems now drawing more than a megawatt. The IBM BlueGene/Q Prototype, ranked at number 110, is the most energy efficient on the current list at 2,097 Mflops/watt.

Power consumption and efficiency are both on the rise on among Top 500 systems. On average they use 543 KW, up from 447 KW six months ago and 397 KW one year ago. Their average power efficiency is 248 Mflops/watt, up from 219 Mflops/watt six months ago and 195 Mflops/watt one year ago.

One of the trends helping make systems more power efficient is the use of graphics co-processors. A total of 19 Top 500 systems use GPUs as accelerators up from 17 systems six months ago.

Twelve of the systems using GPUs employ Nvidia chips, five use IBM Cell processors and two use AMD Radeon chips. "Software has a lot to do with [Nvidia's dominance] because there is a lot of software written in [Nvidia's] CUDA environment," said Jack Dongarra, a professor at the University of Tennessee who is one of the authors of the Top 500 list.

Each chip sports eight cores, giving the machine a total of 548,352 cores—almost twice as many as any other system in the Top 500. The average Top 500 system uses 15,550 cores per system, up from 13,071 six months ago and 10,267 one year ago.

The 58W Fujitsu chips deliver 128 GFlops at their peak 2 GHz data rate. Key to their performance are a set of extensions geared for high performance clusters. The extensions let applications manage the chips' 6 Mbytes shared L2 cache. They also provide support for SIMD, 256 floating point registers per core and an inter-core hardware synchronization capability.

Another factor in the K Computer's success is "the processor and interconnect were designed together and not put together, they are matched," said Dongarra.

The interconnect, called Tofu, is a 6-D mesh/torus with 5 GBytes/s bandwidth that needs no external switch. When it is built out to its maximum of linking 100,000 nodes, the resulting system should deliver performance of 10 petaflops.

Another admirable aspect of the design is that a node uses just a single CPU. That means the system supports a relatively simple memory hierarchy and higher memory bandwidth.

"If you want to build the fastest computer system, it needs to be balanced and well integrated--the K system is that," said Dongarra.

I wonder how many of those systems use FPGAs for algorithm acceleration as Xilinx and Alteration claim massive performance gains.
Imagine a supercomputer with Xilinx Virtex-7-2000s that can be cool but also a programming challenge.

We do have cost effective super computing .... it is likely sitting on your desk in front of you!
Supercomputing is all time relative.
A Cray-1 peaked at 160Mips, 250MFlops.... a small fraction of what the processor and GPU in your laptop and desktop are capable of.
So yes, this level of performance will arrive at $100K - $500K, then $5K and maybe even $500 ... it will just take time.

Quite impressive I would say, that too without using the GPUs for number crunching. That they used a Sparc makes it all the more interesting because of it's legacy architecture.
I guess nearly 30% of that power is used by the system itself apart of the cores. But still it is a huge power and could nearly dominate the expenditure for any institution using this level of processing.
of course this mega processing would only be deployed in govt R&D and weather stations in drug development by big pharma companies.
Though these numbers are pleasing, it would be more valuableto see some really affordable super computing within the budget of a medium sized company, say for ex: $100k-500K