TSUBAME3.0 gears up for AI supercomputing with 2160 Tesla P100s

Nvidia just announced a new partnership with the Tokyo Institute of Technology to create what it calls Japan's fastest AI supercomputer. The machine will be known as TSUBAME3.0, and predictably, this will be the third iteration of the TSUBAME cluster design. The 3.0 version will use Broadwell-EP Xeons in combination with Nvidia Tesla P100 accelerators to achieve an expected 12.2 PFLOPS of double-precision throughput. Nvidia says the new cluster will operate alongside the existing TSUBAME2.5 machine (which uses over four thousand Tesla K20X cards) to crunch up to 64.3 PFLOPS for AI work.

TSUBAME is actually an acronym. According to the project's website, it stands for "Tokyo-tech Supercomputer and UBiquitously Accessible Mass-storage Environment." It's also a Japanese word that refers to the swallow. Next Platform reports that TSUBAME3.0 will use 540 blades designed by HP Enterprise, each equipped with four Tesla P100 processors and two Xeon E5-2680 v4 chips. Each node will have 1.08 PB of storage and 256GB of main memory, plus the 64GB of HBM2 between the four Tesla chips.

If it reaches its performance targets, TSUBAME3.0 will end up in the top 10 of the Top500 list. The existing TSUBAME 2.5 machine sits at 40th place in the current ranking. Way back in 2008, the original TSUBAME machine was one of the first to combine x86 CPUs with Nvidia Tesla compute accelerators to achieve massive number-crunching throughput. At that time, it reached 29th in the Top500 list with floating-point throughput of 77.48 TFLOPS. Times sure have changed.