There were a few surprises at this month's Supercomputing 17 conference, including a very good showing for the PEZY-SC2 accelerator, as well as Intel's change in direction for its Xeon Phi line. In addition, we also saw the first systems built on Intel's Xeon Scalable Processor (Skylake) and the first to feature Nvidia's Volta accelerator.

But perhaps most notable is the news that Chinese systems now account for 202 of the 500 supercomputers on the newest Top 500 list compared with just 143 from the US. American systems have dominated the list since its inception 25 years ago, and just a few months ago, the US had 169 systems to China's 160. The same is true when it comes to aggregate performance, with the Chinese systems combined accounting for 35.4 percent of the total performance of the Top 500 systems.

The fastest computers in the world continue to be the two massive Chinese machines that have topped the list for several years now: Sunway TaihuLight, from China's National Supercomputing Center in Wuxi, with sustained Linpack performance of more than 93 petaflops (93 thousand trillion floating point operations per second), and the Tianhe-2, from China's National Super Computer Center in Guangzhou, with sustained performance of more than 33.8 petaflops. These remain the fastest machines, and by a huge margin. The Piz Daint system from the Swiss National Supercomputing Centre, a Cray system that uses Intel Xeons and Nvidia Tesla P100s, held third place with Linpack sustained performance of 19.6 petaflops.

The biggest change at the top is a new system in fourth place: an upgraded version of the Gyoukou supercomputer, a ZettaScaler-2.2 system deployed at Japan's Agency for Marine-Earth Science and Technology. This machine uses PEZY-SC2 accelerators, a second-generation 2048-core chip that provides a peak performance of 4.096 teraflops in double-precision, as well as conventional Intel Xeon processors, for a total of 19,860,000 cores. (An earlier ZettaScaler machine with the PEZY-SC2 made the list at a lower level on the June version). That gives it the highest number of cores used together—also known as the highest level of concurrency—seen to date, surpassing the TaihuLight, which has 10.6 million cores. The Gyoukou machine achieved a Linpack sustained performance of 19.14 petaflops, but what's interesting is that it uses 1.35 megawatts of power, compared to 2.27 megawatts for PizDaint, 17.8 megawatts for Tinahe-2, and 15.4 megawatts for Taihulight. That's still a lot of power, but it's a big reduction compared to the other systems and a strong indication that power concerns are important, even for the fastest machines in the world. It's also worth noting that this shows how new architectures can reduce power draw dramatically.

The top US system remains the Titan supercomputer at the Oak Ridge National Laboratory, a five-year-old system that uses Nvidia K20x GPU accelerators and delivers 17.59 petaflops, which is now in fifth place on the list.

In the latest Green 500 list of the most energy-efficient supercomputers, four of the top five slots—including the top three—went to newly-installed Japanese systems, all based on the ZettaScaler-2.2 architecture and the PEZY-SC2 accelerator. The most efficient of these is the Shoubu System B, which is installed at RIKEN's Advanced Center for Computing and Communication. The Shoubu System B achieved 17.0 gigaflops/watt; Shoubu System B and the next two systems, which use 16.8 and 16.7 gigaflops/watt respectively, are all in the bottom half of the Top 500 list. The fifth system on the Green ranking is the Gyoukou system mentioned above—the number four system on the Top 500 list, at 14.2 gigaflops/watt.

These are big breakthroughs for the PEZY-SC2 accelerator, and may point to future directions for supercomputing architectures.

The fourth greenest supercomputer is Nvidia's internal DGX SaturnV Volta system, which achieved 15.1 gigaflops/watt, and comes in at 149 on the Top 500 list. This system has 22,440 Volta cores (which seem to be counted in a somewhat different way on the list than some of the other cores.) Nvidia has been having a very good year for its accelerator, and has high hopes for more machines using the Volta GPU architecture.

As usual, the major vendors were crowing about their successes on the list, with Intel noting that its CPUs were in six of the top ten systems and a record high of 471 out of 500 systems. Intel also noted that its new Xeon Scalable Processors were in 18 supercomputers with over 25 petaflops of performance. But what may be more notable is that Intel said it is cancelling Knights Hill, the planned 10nm successor to the 14nm Knights Landing Xeon Phi processor. The company said it is now planning a new platform for exascale systems (1,000 petaflops) by 2021, but didn't divulge any details.

Nvidia emphasized that it had 34 new systems with its accelerators on the list, bringing the company's total to 87. Nvidia and partner IBM were crowing about the possibility that, by the time the next list is due in June, the Summit machine at Oak Ridge National Laboratory (ORNL) should be among the machines at the top of the list. This machine features 4600 nodes, each with two IBM Power 9 base CPUs and 6 Nvidia Volta accelerators, with a projected performance of about 200 petaflops. This differs from Nvidia's internal solution, in that the CPUs and GPUs all communicate over NVLink 2.0 in a cache-coherent manner using OpenCAPI, so the GPUs can directly access main system RAM. Summit will be followed by the Sierra machine at Lawrence Livermore National Laboratory, and by the AI Bridging Cloud Infrastructure (ABCI) machine in Japan.

I was interested to hear Cray announce a "production-ready" supercomputer based on the Cavium ThunderX2 processor on the 64-bit Armv8-A architecture, now available as part of its XC50 supercomputer. ARM-based machines are being tested at the Barcelona Supercomputer Center (whose Mare Nostrum machine now ranks at number 16 on the Top 500 list), as well as the "Post-K" supercomputer in Japan and the Isambard supercomputer in the UK. Cavium had some early benchmarks on the ThunderX2, which showed the 14nm chip performing better on multi-threaded or memory-bound applications than Intel's Skylake Xeons, though Intel remains the leader in single-threaded and in raw teraflops. Note that in addition to the Cavium design, Qualcomm has also announced an ARM-based server chip called Centriq.

In other processor news, AMD announced it had ramped production of its Epyc processors, though these aren't yet in any Top 500 systems, while NEC announced versions of its Vector Engine PCIe accelerator card for its new SX-Aurora TSUBASA supercomputer series, which have a particularly fast memory bandwidth.

On the interconnects side, Mellanox said that 77 percent of new systems on the Top 500 list use InfiniBand, while Intel touted recent successes with its Omni-Path Architecture, which is mostly used in its Xeon Scalable Processor (Skylake) systems. Meanwhile, a number of vendors are looking to Gen-Z, designed to be a lower-latency, memory-centric approach for very high-speed connections between compute and memory/storage devices.

Between the strong first impression from PEZY-SC2, Intel's decision to dump Knights Mill for a new architecture, Nvidia's Volta, and new competition from AMD, ARM vendors, and NEC, now is an exciting time in the world of supercomputing. Next year's lists should be quite interesting, as we see which architectures really perform, and which are most efficient, as many of the vendors and the supercomputer sites try to position themselves in the race to produce an exascale (1000 petaflop) computer with a sub-20 megawatt power draw.

Get Our Best Stories!

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.