Latest Benchmark Results on NEC Super Highlights SX-9 Performance

By John Boyd

November 19, 2008

Researchers at Tohoku University in Sendai, north-eastern Japan, announced on Wednesday that they had broken a batch of performance records on their NEC SX-9 supercomputer, as measured on the HPC Challenge Benchmark test. Hiroaki Kobayashi, director the university’s Cyberscience Center, said the SX-9 had achieved the highest marks ever in 19 of 28 areas the test evaluates in computer processing, memory bandwidth and networking bandwidth. The scores were matched against those previously achieved on the same independent benchmark test by other leading supercomputers, including IBM’s Blue Gene/L, Cray’s XT3/4 and SGI’s Altix ICE, with the SX-9 coming out on top 64 percent of the time.

The news comes at a good time for NEC. The Tokyo-based manufacturer of vector-based supercomputers is battling in a market that has been moving away from its expensive high-performance vector processing models to systems that use more modestly priced commodity-type superscalar CPUs. These cheaper chips can be coupled tightly together or used in clusters of computers to achieve similar or better results than vector competitors — at least in some areas of supercomputing.

At Tohoku University, however, a stronghold of vector computing since it installed its first SX-1 in 1985, Director Kobayashi argues that vector computing is essential for certain types of applications and will only increase in importance as advances are made in parallel processing.

“In the future, data parallel processing will become more important in high performance computing,” says Kobayashi. “And vector processing provides a very efficient model for it.” This is why, he adds, Intel, which has long provided short vector SIMD code extensions for its x86 architecture, is employing wider vector operations in its upcoming Larrabee graphics processing chip. “Regarding parallel processing, at the instruction-set level, vector instruction sets are the key to future processors, no matter what kind of micro-architecture is used,” says Kobayashi.”

In addition, he emphasizes that for the kind of programs that the 1,500 paying supercomputer users of the University’s Cyberscience Center want to run, vector is still king. Most of these users are involved in government and academic research programs in areas like aerospace, environmental simulations, structural analysis and nanotechnology. “They want to conduct very large simulations, so are looking for an efficient handling mechanism to process extremely large amounts of data in a single operation,” says Kobayashi. “Vector processing is best suited to this kind of application.”

The SX-9 employs a single-chip vector processor capable of reaching 102 GFLOPS. Up to 16 CPUs sharing 1 TB of memory can be incorporated on a single node, combing to produce 1.6 TFLOPS of peak performance. The Tohoku University SX-9 set-up, which began operations this April, consists of 16 nodes, each of 16 CPUs, producing an overall peak performance of 26 TFLOPS. On a sustained performance bases, the Cyberscience Center’s test results show a single SX-9 CPU outperforms that of the previous SX-8R by between four to eight times, depending on the application.

Much of the new CPU’s improved performance can be accounted for by the addition of an arithmetic unit and raising the number of vector pipelines — all integrated on a single chip that is the first to surpass 100 GFLOPS.

But Kobayashi notes that a new feature of the SX-9, the inclusion of an assignable data buffer or ADB, has also helped boost performance significantly. “ADB is software-controllable cache memory,” he explains. “It lets the user assign the data to be cached, which prevents it from being evicted.”

In a simulation used to detect the presence of land mines with electromagnetic waves, for instance, performance increased by 20 percent when ADB was used. In another simulation, which tracked the movement of tectonic plates (the cause of earthquakes), the use of ADB improved performance by 75 percent, while a simulation involving the physics of plasma under certain conditions saw performance jump two times when employing ADB.

Despite such gains, Kobayashi has a gripe with the current ADB design: the cache space is limited to just 256 kilobytes. This means users cannot place all the target data in the cache; rather, they must select only the portion that they judge will work most effectively in ADB. To determine the optimum amount of cache memory, the Cyberscience Center, which is developing a software simulator based on the SX-9 architecture to design future supercomputer models, ran simulations using real application code. To achieve the highest performance, the researchers found that a minimum of 8 MB of ADB memory is necessary. NEC has been so advised.

Regarding the HPC Challenge Benchmark results, it was no surprise that the SX-9, the architecture of which is particularly designed to produce efficient processing of large data amounts, came out on top in memory performance and did well in networking bandwidth. But Kobayashi was also keen to point out that when it came to computing performance, despite the relatively small size of the Center’s SX-9 set-up, it still competed well against much larger configured systems.

“In the case of global-FFT testing, for instance, we still made second place to Cray’s XT3, which is a huge system, with maybe 100 times more processors,” says Kobayashi. “And while the XT3’s peak performance was five times higher (than our system) its global-FFT result was only 20 percent higher. So if we could add even just one more lane (consisting of four nodes) we would expect to do much better.”

In recent years NEC has had to relinquish its No. 1 position in the TOP500 list of best performing supercomputers to scalar-based systems from Cray, IBM and other competitors when it comes to sheer peak speeds. As a result, it has turned to emphasizing efficient sustained performance and productivity. But now there is belief within the company that given a large enough SX-9 installation, NEC could once again challenge for the top performance spot, which it held from 2002 to 2004 with its SX-6 generation.

“Next March JAMSTEC (Japan Agency for Marine-Earth Science Technology) will begin operations of its Earth Simulator II,” notes Rie Toh, manager of NEC’s HPC marketing promotion division. The system, used to forecast global climate changes, typhoons and other extreme weather conditions, as well as predict earthquakes, volcano activity and the like, will use NEC supercomputer technology, as did the previous Earth Simulator I. The new system will incorporate 160 SX-9 nodes, each containing eight CPUs, making a total of 1280 CPUs. NEC says this would produce a peak performance of 131 TFLOPS. “Given that Cray’s XT3 holds the HPC Challenge Benchmark’s highest score for G-FFT system performance with 124.4 TFLOPS,” says Toh, “we are eager to see what the SX-9-based Earth Simulator II will achieve when it’s up and running.”

But NEC’s window of opportunity to win speed-king bragging rights may not be open for long. In the endless game of breaking supercomputer performance records, Cray has just announced it plans to ship its next-generation XT5 model at about the time the Earth Simulator II is to begin operations.

Seeking to reign in the tediousness of manual software testing, Pfizer HPC Engineer Shahzeb Siddiqui is developing an open source software tool called buildtest, aimed at automating software stack testing by providing the community with a central repository of tests for common HPC apps and the ability to automate execution of testing. Read more…

By Tiffany Trader

In just a few months time, Senegal will be operating the second largest HPC system in sub-Saharan Africa. The Minister of Higher Education, Research and Innovation Mary Teuw Niane made the announcement on Monday (Jan. 14 Read more…

By Tiffany Trader

If it's Nvidia GPUs you're after to power your AI/HPC/visualization workload, Google Cloud has them, now claiming "broadest GPU availability." Each of the three big public cloud vendors has by turn touted the latest and Read more…

Previous:

STAC (Securities Technology Analysis Center) recently released an ‘exploratory’ benchmark for machine learning which it hopes will evolve into a firm benchmark or suite of benchmarking tools to compare the performanc Read more…

By James Reinders

Quantum computing has lived so long in the future it’s taken on a futuristic life of its own, with a Gartner-style hype cycle that includes triggers of innovation, inflated expectations and – though a useful quantum system is still years away – anticipatory troughs of disillusionment. Read more…

By John Russell

Anyone who has checked a forecast to decide whether or not to pack an umbrella knows that weather prediction can be a mercurial endeavor. It is a Herculean task: the constant modeling of incredibly complex systems to a high degree of accuracy at a local level within very short spans of time. Read more…

By John Russell

Cray revealed today the details of its next-gen supercomputing architecture, Shasta, selected to be the next flagship system at NERSC. We've known of the code-name "Shasta" since the Argonne slice of the CORAL project was announced in 2015 and although the details of that plan have changed considerably, Cray didn't slow down its timeline for Shasta. Read more…

By Tiffany Trader

It’s been a good two weeks, AMD’s Gary Silcott and Andy Parma told me on the last day of SC18 in Dallas at the restaurant where we met to discuss their show news and recent successes. Heck, it’s been a good year. Read more…

By Tiffany Trader

For nearly two hours on Monday at SC18, Jensen Huang, CEO of Nvidia, presented his expansive view of the future of HPC (and computing in general) as only he can do. Animated. Backstopped by a stream of data charts, product photos, and even a beautiful image of supernovae... Read more…

By John Russell

Riding healthy U.S. and global economies, strong demand for AI-capable hardware and other tailwind trends, the high performance computing server market jumped 28 percent in the second quarter 2018 to $3.7 billion, up from $2.9 billion for the same period last year, according to industry analyst firm Hyperion Research. Read more…

By John Russell

As part of the run-up to SC18, taking place in Dallas next week (Nov. 11-16), Intel is doling out info on its next-gen Cascade Lake family of Xeon processors, specifically the “Advanced Processor” version (Cascade Lake-AP), architected for high-performance computing, artificial intelligence and infrastructure-as-a-service workloads. Read more…

By Tiffany Trader

Networking equipment powerhouse Mellanox could be an acquisition target by Microsoft, according to a published report in an Israeli financial publication. Microsoft has reportedly gone so far as to engage Goldman Sachs to handle negotiations with Mellanox. Read more…