South African Lengau System Leaps Towards Petaflops

There are plenty of people in industry, academia, and government that believe there is a direct correlation between the investment in supercomputing technologies and the healthy and vibrancy of the regional or national economy. So getting a big bump up in performance, as South Africa’s Center for High Performance Computing has just done this week, is a big deal.

Up until now, CHPC has had fairly modest sized systems, but thanks to Moore’s Law advancements that have radically brought down the cost of compute and a more aggressive plan to invest in HPC within South Africa, CHPC is breaking into the petaflops club with its new “Lengau” machine, which was fired up today and which appropriately enough means “Cheetah” is the Setswana language. The Lengau system is going to allow more complex simulations and models to be run, and more of them, too, since big supers are often used as much for capability-class workloads that try to use a big chunk of the system as for capacity to run multiple, concurrent workloads.

When CHPC was set up in Cape Town by the Council for Scientific and Industrial Research (CSIR) in 2007, the “iQudu” system that was built by IBM (and named after a species of antelope) using its e1350 server platform. Each node had dual two-core Opteron processors running at 2.6 GHz, and eight of the nodes had Clearspeed floating point accelerators (remember those?). Linked with 10 Gb/sec InfiniBand, the iQudu cluster had 16 TB of main memory across its 160 nodes and 640 cores, and was rated at a peak performance of 2.5 teraflops. A year later, IBM donated a 14 teraflops BlueGene/P system with 4,096 cores and 2 TB of main memory across its 1,024 nodes. CHPC moved on to hardware from Sun Microsystems in 2009, with the “Tsessebe” cluster, named after another breed of African antelope, comprised of several generations of Sun Xeon server nodes rated initially at 24.9 teraflops and eventually being upgraded to 64.4 teraflops with additional systems from Sun and Dell. The Tsessebe cluster used 40 Gb/sec InfiniBand to lash the nodes together, and is still in operation today, as are the iQudu and BlueGene/P machines. (This latter box was not given a cool nickname.)

The new Lengau machine is a substantially larger system than anything that CHPC has deployed before, and is the most powerful machine that the African continent has seen to date. (Well, at least until Amazon Web Services, Google, or Microsoft plunk down a datacenter there.) The main compute part of the system is based on Dell’s Power C6320s, which we profiled last summer when they were rolled out with “Haswell” Xeon E5 v3 processors from Intel. The Lengau system has 1,008 nodes, with four nodes in each C6320 enclosure, and uses twelve-core Xeon E5-2690 v3 processors in each of the two-socket nodes.

Do the math, and that works out to 24,192 cores in total for a peak theoretical performance of 890 teraflops, with an estimated LINPACK matrix math benchmark performance of 774.5 teraflops. The nodes are linked to each other by 56 Gb/sec FDR InfiniBand networks, and have access at a 4 PB Lustre parallel file system that is based on Dell’s PowerVault MD3460 arrays and front-ended by two dozen PowerEdge R630 nodes that act as management controllers. For memory-intensive workloads, the Lengau cluster has five PowerEdge R930 nodes with a total of 280 cores and 5 TB of main memory that can link to the other Xeon nodes and to the Lustre file system. The whole shebang is managed by Bright Cluster Manager.

With Intel launching the “Broadwell” Xeon E5 v4 processors at the end of March and Mellanox shipping 100 Gb/sec InfiniBand since the fall of 2014, it is reasonable to ask why CHPC wouldn’t wait for the shiny new processors or adopt the faster InfiniBand interconnect. The answer is simple: speed costs money, and time. Ed Turkel, HPC strategist at Dell, tells The Next Platform that CHPC wanted to get a machine on the Top 500 supercomputer rankings by the June list that will come out during the International Supercomputing 2016 conference in two weeks, and the cutoff date for submissions to the list was June 3. Getting Broadwell Xeons was going to be cutting it a little too close, and perhaps for no substantial benefit other than a modest increase in compute density. As for 100 Gb/sec, we don’t know CHPC’s thinking, but it is not at all unusual to hang back a generation on network interconnects, where the price/performance is good.

The important thing, perhaps, is that CHPC specifically and South Africa in general now has a resource that offers 13.8X times the performance of its prior system. CHPC is one of the computational centers involved with signal processing for the Square Kilometer Array, so this performance bump was necessary, and also does a lot of work in genomics and computational fluid dynamics, too, according to Turkel. The facility is also involved in teaching HPC skills to students in Africa and does a lot of outreach to local industry, too. So having more oomph means it can have a greater impact on all fronts.

Pricing on the Lengau cluster was not announced, but there is no question that this era in supercomputing is different from three or four decades ago when only the largest countries on earth could afford to have such high-end computing capability.

“CHPC is a poster child for the democratization of HPC,” says Turkel. “Here is an example of an organization that could not have afforded a petaflops-class system even a few years ago. With current technology and current pricing, they can. A thousand nodes is not an enormous system by any stretch of the imagination, so we are talking about the convergence of the performance of individual systems getting better and the ability to deliver something at this kind of scale becoming easier to do.”

And, we might add, without resorting to the use of accelerators, as it did in its first cluster and as it has experimented with in adjunct systems. That said. Turkel says CHPC has some test systems employing Nvidia Tesla GPU accelerators, and he also fully expects that the facility will get some test units of servers running Intel’s “Knights Landing” Xeon Phi processors when they become generally available later this year. Sticking with CPU-only clusters was probably just a first step for CHPC, and a big one at that. Turkel says that he is unaware of any projects in Africa under way to build a bigger system, and he expects for Lengau to be the largest system on the continent for some time.

We will hook up with Happy Sithole, director of CHPC, at the ISC2016 conference to get some more insight into how this machine will be deployed and what his plans are for future systems.