Virtualization and HPC - Will they ever marry?

SC08 Server virtualization has spent the past several decades moving out from the mainframe to Unix boxes and then out into the wild racks of x64 servers running Windows, Linux, and a smattering of other operating systems in the corporate data center. The one place where virtualization hasn't taken off is in high performance computing (HPC) clusters.

And for good reason. But as hardware costs continue to plummet, making hundreds of teraflops of raw computing power in a parallel x64 server cluster available to even medium-sized businesses, startups, academic institutions, research facilities, and other places where HPC clusters end up - and at a relatively modest price - the system administration demands on HPC labs and the desire for more flexibility may possibly - and I mean possibly - see the adoption of server virtualization technologies in this subsegment of the server space.

Roughly speaking, HPC clusters account for about a fifth of the shipments of x64 server boxes each quarter. And according to IDC, in 2007, HPC boxes of all types - including vector, cluster, and other types of gear - accounted for $10.1bn in sales (revised downward from an initial $11.6bn estimate that came out in March of this year). That gives HPC an 18.6 per cent take of the $54.4bn in server sales in 2007, again about a fifth of the piece.

But the interesting bit is that if you take HPC machines out of the picture, general-purpose sales would be nearly flat for 2007. And equally importantly, if you remove the HPC boxes from the mix, then the adoption rate on new server sales for virtualization would be a little bit higher than the broader market stats cited by Gartner and IDC.

HPC customers, as a rule, do not use server virtualization because of the overhead this software imposes. The benchmark tests that server virtualization vendors such as VMware are beginning to use - I am thinking here of VMark, but also the two-year-old SPEC virtualization benchmark effort that has yet to bear fruit - do not show the overhead their hypervisors impose.

But as the x64 platform got virtualization hypervisors a number of years ago, the performance penalty was as high as 50 per cent on some workloads, and even after hardware features to support virtualization have been added to x64 chips from Intel and Advanced Micro Devices, the overhead is widely believed to be in the range of 10, 15, or 20 per cent. But seeing as though there are no independently available tests, customers really have to do their own benchmarks. And by the way, the terms of the ESX Server licensing agreement from VMware apparently do not allow people to publish the results of benchmark tests.

No time for the virual

HPC workloads are more driven by memory bandwidth, I/O bandwidth, and clock cycles than the typical infrastructure workloads out there in the data center and are therefore not as readily virtualizable. To put it bluntly, HPC labs have enough worries about wringing performance out of their machines and about getting more parallelism into their codes to better take advantage of the increasing number of cores they have in a cluster. They can't deal with virtualization, too.

Virtualization has been a boon to infrastructure servers that were underutilized. A typical x64 server running Web, print, file, and other workloads might run with maybe 5, 10, or 15 per cent of their CPU cycles being used utilized on average. (There are always peaks that spike above that). Hypervisors allow four or five server instances to be crammed onto one single physical server, with the added bonus these days of faster server provisioning and disaster recovery to boot. But in HPC clusters, CPUs are running at near their peaks all the time they are doing work.

But, having said that, system administration is an issue for clusters, just like it is for other servers, and people cost more money than software and iron. Setting up and configuring nodes in the cluster is a pain, and virtualization can help. Think about Amazon's Elastic Compute Cloud utility computing setup, which runs atop a tweaked version of the open source Xen hypervisor.

While this EC2 capacity is available on the cheap, it runs in a virtualized mode, and you could argue that one of the reasons it is so cheap is because it is virtualized and hence flexible. It is possible that HPC shops wanting to run distinct applications on different flavors of Linux or a mix of Linux and Windows will use hypervisors allow for this configuring and reconfiguring more easily. But plenty of people are skeptical of the idea.

"The biggest use of virtualization is to allow multiple applications to run protected," explains David Scott, petascale product line architect in Intel's HPC platform unit. "This is potentially an area. Customers are thinking about it, but no one has done it yet."

And the reason why virtualization has not been used in HPC shops is the same one that made server virtualization in data centers take off slowly: server hugging. "The idea of giving a piece of a processor to someone else is completely alien to HPC people," says Scott.

And that is why, for now, server virtualization and HPC will probably remain oil and water - at least as long as there are graduate students and scientists to man the clusters for free or nearly so. Then again, if you shake up things enough, you can get oil and water to make a suspension. Maybe HPC's salad dressing days are ahead. ®