DDN claims burst buffer bashes 'past 1TB/sec bandwidth'

Japanese superdupercomputer kept fed with data by burst buffer beast

This is a we-can-pee-up-the-wall-higher-than-anyone-else story – one which we'd normally give a miss - only the numbers are past 1TB/sec – head-scratchingly high.

The Oakforest-PACS T2K massively parallel 25 petaflop superdupercomputer machine is number 6 in the TOP500 list at SC16 in Salt Lake City. It's operated by the Joint Center for Advanced High Performance Computing (JCAHPC), which is run collaboratively by the Information Technology Center at the University of Tokyo and the Center for Computational Sciences at the University of Tsukuba.

The Japanese installation's file system cache is a monster, comprised of 25 x DDN IME14KX systems, which take up two-and-a-half data centre racks or more. Each IME 14KX has 48 x 800GB NVMe SSDs, and eight Intel Omni-Path ports. That makes 1,200 SSDs in total.

The 25 systems provide a cache of 960 TB (25 x 48 x 800GB) and a logical bandwidth of 1.5 TB per second, meaning 1,250MB per drive. There are several faster NVMe drives out there but it may be they would saturate the IME14KX IO system.

DDN claims other methods of getting past 1TB/sec with file systems need 25 to 250 racks, 10 to a 100 times more racks.

DDN tells us that it measured the machine's I/O performance by using the Livermore Computing Center's IOR benchmark, which looks at two IO access patterns:

SSF - Single Shared File where all parallel processes perform I/O to a single shared file

The benchmark results were:

FPP write and read at 1.14TB/sec and 1.2TB/sec respectively

SSF write and read at 1.18TB/sec and 1.25TB/sec respectively

That's not just peeing up the wall; it's doing a reverse Niagara with aggregated fire hoses.

DDN claims "SSF is an access method that cannot realise sufficient performance with a conventional parallel file system, but is considered an effective access method for the next generation of Exascale supercomputers."