Windows Server pushed to the super limit

SGI puts Microsoft on 256 cores

By Timothy Prickett Morgan,
28 Mar 2011

Server maker Silicon Graphics doesn't think it can take on the entire Windows server market, but the company – which is best known for supercomputers and hyperscale rack servers – does think it can chase and win deals in the evolving HPC market for windows.

That is why it has certified Microsoft's Windows Server 2008 R2 on its Altix UV 1000 machine supercomputer clusters. These machines are based on SGI's homegrown NUMAlink 5 interconnect, formerly known as "UltraViolet," that hooks into the QuickPath Interconnect on Intel's Xeon 7500 processors and their Boxboro chipset to create a single system image that has global shared memory accessible to 128 blades. Those blades use the eight-core "Nehalem-EX" Xeon 7500s, and have two sockets on them, for a total of 256 sockets and 2,048 cores - all sharing up to 16 TB of main memory in a cache-coherent fashion.

As El Reg previously reported, the Pittsburgh Supercomputing Center got a $2.8m National Science Foundation grant last fall to install two Altix UV 1000s in a cluster, which is nicknamed "Blacklight." The Agency for Science, Technology, and Research (A*Star) in Fusionopolis, Singapore, also acquired an Altix UV 1000 system with 2,112 cores and 12.3 TB of shared memory last fall. Neither of these machines are running Windows Server as far as we know, but they could.

But because of limitations in Windows Server 2008 R2, the Microsoft operating system cannot scale across the full Altix UV 1000 machine. Windows Server 2008 R2 Datacenter Edition, which is the one that SGI has certified on its big bad blade box, currently tops out at 256 logical processors (which can be cores or threads, if you turn on Intel's HyperThreading feature) and 2 TB of main memory.

Two weeks ago, when SGI first delivered Windows Server on its Altix UV 1000 machines, it could scale up to 128 cores or threads and up to 1 TB of main memory. But as of last Friday, it doubled that up to 256 cores or threads and 2 TB of main memory. That is as far as Windows goes at the moment, and to support 256 cores, you have to turn HyperThreading off. (Many HPC customers do that anyway because HyperThreading can hurt rather than help performance on CPU-bound jobs.)

Speaking to El Reg, Mark Barrenechea, SGI's president and chief executive, said that now that it had worked with Microsoft to push Windows Server 2008 R2 to its limits, SGI was now working with key HPC application providers in the Windows space to make sure that their code scales well on the Altix UV 1000s. The company will be doing a variety of benchmarks to demonstrate scalability on finite element analysis and computational fluid dynamics workloads running on Windows, and is also working on some TPC-C transaction processing and data warehousing benchmark tests, too, according to Barrenechea.

"The hardware has clearly outpaced the software," Barrenechea explained, as it often does in the systems space. "But we've brought Windows Server to its design point." And equally importantly, SGI will be able to deliver iron on day one whenever Microsoft boosts the scalability limits of Datacenter Edition.

Hewlett-Packard's ProLiant DL980 G7, announced back in June last year, has eight Xeon 7500 sockets and tops out at 64 cores. Oracle's Sun Fire X4800 similarly has eight Xeon 7500s and Fujitsu's Primergy RX900 S1 are also eight-socket Xeon 7500 boxes. Dell doesn't have an eight-socket machine using this Intel processor, and NEC's Express5800/A1160 MX beast can glom together four four-socket motherboards to create a 16-socket, 128-core beastie. IBM's System x3850 X5 tops out at four sockets (but the EX5 chipset can, in theory, scale further), and the old System x3950 M2 using the Xeon 7400 processors scaled up to 16 sockets and 96 cores.

IBM has quietly announced the System x3950 X5, which is a two-node configuration based on the x3850 X5 that offers eight sockets of Xeon 7500 oomph, for a total of 64 cores. The EX5 chipset from IBM should allow for up to four boards to be linked together for 16 sockets, 128 cores, and 256 threads, which would max out Windows Server 2008 R2. But IBM has not delivered such a beast, much less talked about it.

None of the iron above except the real NEC box and the hypothetical IBM one really stresses out Windows Server 2008 R2 in terms of threads and cores. Right now, Microsoft could raise the supported core count by a factor of eight and the thread count by a factor of sixteen and the existing Altix UV 1000s could handle it, as is. So SGI is not even close to being stressed by whatever Microsoft could throw at its iron.

And when SGI swaps out the Xeon 7500s for the future 10-core, 20-thread "Westmere-EX" Xeon E7 processors due later this summer or fall, SGI will be able to put 2,560 cores and 5,120 threads all under a single chunk of memory, which I am guessing will double up to 32 TB.

"We are very eager for the Westmere-EX launch," Barrenechea tells El Reg. "We obviously will see a big benefit from adding more cores to the Altix UVs."

If SGI can demonstrate that there is a need for such scalability for the Windows platform, then Microsoft and SGI will no doubt do the work to get the hardware and the software into synch down the road.

This has already happened with Linux running on the Altix UV 1000 machines. SUSE Linux Enterprise Server 11 scales up to 4,096 logical CPUs on either Itanium or x64 processors. On Itanium, SLES 11 was designed to theoretically support up to 1 PB of main memory, but was certified at 8 TB. On x64 machines, SLES 11 is designed for up to 64 TB, but is only certified to 16 TB. So SLES 11 can do twice the cores and all of the memory of the current SGI Altix UV machines. SGI and Novell worked very hard on that scalability over years, by the way.

Red Hat was not interested so much in the HPC space, and RHEL 5 topped out at 1 TB (theoretical) and 256 GB (certified) of main memory in a shared memory system, and on the processor front, RHEL 5 scaled up to 255 processors (theoretical) and 64 processors (certified). This was not enough to even support the old Altix 4700s and their Itanium chips, much less the beefier Altix UV 1000s. But with RHEL 6, Red Hat has done a much better job, supporting up to 4,096 theoretical processors and 64 TB of theoretical maximum main memory. (The certified levels have not been announced, but there is a skinnier kernel extension that maxes out at 128 cores/threads and 2 TB of memory.)

Just because these two Linuxes can span all those cores and memory sticks doesn't mean performance scales anything near linearly. It will be interesting to see how Linux and Windows scale on the same iron running the same workloads, should SGI do the right thing and present benchmarks running on both platforms. But comparisons are odious, and it is just as likely that SGI runs a set of workloads that show off Windows and another set that show off Linux. ®