from VMware's performance team

Monthly Archives: February 2013

In our prior VMworld sessions and performance white papers, we have presented user experience performance results based on VMware View® Planner, a tool that can generate workloads that are representative of many user-initiated operations in VDI environments. While we have discussed briefly about this tool in prior occasions, there have been many requests to get the architectural details and inner working of the tool. To provide more deep dive and technical details on View Planner, we have recently published an article in the recent release of VMware technical journal (VMTJ Winter 2012), which can be found here: VMware View Planner: Measuring True Virtual Desktop at Scale.

View Planner supports typical VDI user operations and also administrator’s management operations that can be configured to allow VDI evaluators to more accurately represent their particular environment. In this paper, we describe the challenges in building such a workload generator and the platform around it, as well as the View Planner architecture and use cases. We also explain how we used View Planner to perform platform characterization and consolidation studies, find potential performance optimizations and several other use cases.

Each new generation of servers brings advances in hardware components. For IT professionals purchasing or managing new generations of hardware, it’s vital to understand how these incremental hardware improvements translate into real-world gains in the datacenter. Using the VMware VMmark 2.5 virtualization benchmark, we compared performance and energy efficiency of two different generations of servers in four-node clusters.

VMmark 2.5 is a multi-host virtualization benchmark that uses varied application workloads as well as common datacenter operations to model the demands of the datacenter. VMs running diverse application workloads are grouped into units of load called tiles. For more details, see the VMmark 2.5 overview.

Testing Methodology
All tests were conducted on two four-node clusters running VMware vSphere 5.1. We compared performance and energy efficiency between a cluster of previous generation Dell R310 servers, and a cluster of current generation Dell R620 servers. For simplicity, we refer to these as the ‘old cluster’ and ‘new cluster,’ respectively. Among other hardware differences, the old cluster servers contained four-core Intel Nehalem processors while the new cluster servers contained eight-core Intel Sandy Bridge EP processors. Memory in the newer servers was appropriately scaled up to accommodate their increased processing power and represents common current server configurations. Software and storage configurations were identical between clusters.

Results
To determine the maximum VMmark load the old cluster could support, we increased the number of VMmark tiles until the cluster reached saturation, which is defined as the largest number of tiles that still meet Quality of Service (QoS) requirements. We then tested the new cluster at the same number of tiles. All data points are the mean of four tests in each configuration and VMmark scores are normalized to the old cluster’s performance.

The new cluster had a 32% higher VMmark score in combination with a 41% lower CPU utilization. The new cluster also showed a 24% increase in energy efficiency over the old cluster, which we’ll discuss further below. At four tiles, the old cluster was bottlenecked on CPU, resulting in decreased workload throughput, while the new cluster was not. With CPU resources to spare, the new cluster met the requested load at lower latencies, which increased its total throughput and score. Mean I/O latencies remained low for both clusters at 1.2ms reads and 1.1ms writes for the old cluster and 1.0ms reads and 0.9ms writes for the new cluster.

We next determined the maximum VMmark load the new cluster could support. While the old cluster was saturated at four tiles, the new cluster accommodated more than twice the load at nine tiles and produced a score 120% higher than the old cluster. Mean I/O latencies remained low at 1.0ms.

The performance advantages of the R620 over the R310 were largely due to the generational improvements of the R620’s eight-core E5-2665 processor versus the R310’s four-core x3460 processor, which includes improved bus speeds and larger L3 cache, and the R620’s increased memory.

These performance results suggest that it would be possible to replace four Dell R310 servers with two Dell R620 servers and expect better than equivalent performance. We put this to the test by removing two nodes from the new cluster and found that the two remaining nodes did support four tiles at 93% utilization, with an 11% higher VMmark score and 74% greater energy efficiency than the four-host old cluster.

Beyond their raw performance capability, we also compared the two server generations on their energy efficiency. The Performance per Kilowatt metric, which is new to VMmark 2.5, models energy efficiency as VMmark score per kilowatt of power consumed. Below, we’ve plotted energy efficiency against the normalized VMmark score. Both clusters were run with their servers’ power management set to “maximum performance.”

Two trends emerge from this figure. First, at four tiles, the four-host new cluster accomplishes more work at higher energy efficiency than the old cluster. Across the board, the new cluster is more energy efficient than the old cluster. Second, within the four-host new cluster, greater energy efficiency is correlated with increase in VMmark score. As the CPUs become busier, performance increases at a faster rate than the required power. This can be understood by noting that an idle server will still consume power, but with no performance to show for it. A highly utilized server is typically the most energy efficient per request completed, which is confirmed by the two-host new cluster that achieved high efficiency at 93% utilization. Higher energy efficiency creates cost savings in energy consumption and in cooling costs.

Our investigation shows that, while running vSphere 5.1, two newer Dell R620 servers are capable of supporting a greater load than four older Dell R310 servers. Because the Dell R620 performance is more than double that of the Dell R310, a four-node Dell R620 cluster reached a 120% higher maximum score than the Dell R310 cluster. In addition to its performance advantages, at each load level the Dell R620 cluster performed with greater energy efficiency, showing that the Dell R620 has superior performance but also has greater energy efficiency than the Dell R310.