Westmere Performance

SPECpower_ssj2008

After our last experience with SPECpower_ssj2008, it is now a staple for reviews. For a real submission, SPECpower requires a separate controller system that drives the system under test (SUT), and interfaces with an extremely high precision power meter. We instead ran the controller software on the system under test itself – which makes little difference in terms of performance, but is nonetheless not valid. Additionally we opted for the eminently affordable Watts Up Pro (which retails for around $100), while qualified meters start at around a thousand dollars. Since our power meter is not sufficiently accurate, we also used shorter run times (60 seconds at each performance level, instead of 240 seconds).

SPECpower_ssj2008 measures performance for server side Java, much like SPECjbb2005, but the two workloads are not comparable. The scoring works differently and SPECpower reports power consumption to boot, so it basically obviates our need for SPECjbb2005. The software tuning is much the same as SPECjbb – it is a huge knob and heavily dependent on the JVM. Note that for SPECpower_ssj2008, we had hardware prefetching enabled in the BIOS. This typically causes conflicts with the software prefetching and can reduce performance by around 10-15%. The tuning options we used were largely the same for Westmere and Nehalem, except for the number of garbage collection threads, which matches the number of cores:

One of the particularly attractive features of SPECpower is that unlike SPECjbb, it targets specific utilization levels to measure power. We chose to use the standard set of 11 utilization levels – active idle (where the system can accept transactions, but none are being sent by the client/controller) and every 10%, up to full utilization. To score SPECpower, the average ssj_ops over all 11 levels is divided by the average power for all 11 levels – the resulting ratio is the performance to power ratio. We only took a single SPECpower_ssj2008 measurement, but the benchmark was run several times to fine tune the different parameters. The performance results were steady enough that we felt additional runs were not necessary.

Figure 14 – SPECpower_ssj2008 Performance vs. Power

The figure above shows performance (in ssj_ops) on the X-axis, with power consumption on the Y-axis; so the best solution would be in the lower right hand corner, and the slope of the curve for each system shows the price (in power) of additional performance. Additionally, it also shows the absolute performance quite clearly – which the charts in SPECpower aren’t quite as good at conveying. While efficiency is certainly a huge part of the equation for IT staff, absolute performance is just as important. It is easy to improve efficiency by using a processor with lower voltage, frequency and power…but if some workloads now require two systems instead of one – that’s not exactly a gain in efficiency.

Comparing the two trend lines, the difference between the two generations is clear. At each activity level (e.g. 10%), the power consumption is the same, but Westmere delivers 30% more performance as expected. At full utilization, Westmere yields 708K ssj_ops versus 543K for Nehalem. A more extensively tuned system could probably deliver 15-20% more performance.

Conversely, for a given level of performance Westmere uses less power, but not by a constant factor. At lower performance levels (e.g. around 210K-280K ssj_ops), Westmere decreases power usage by around 13-19W; while at higher performance levels (e.g. 490K-560K) the power difference is about 40W. Respectively those power savings are ~6-8% and ~15%.

Figure 15- SPECpower_ssj2008 Performance vs. Power Efficiency

This chart is a variation on the prior one – instead of showing power on the Y-axis, it shows the performance to power ratio that is the primary metric for SPECpower. Again, it clearly shows both the trade-offs of running at various utilization levels for a given system, and the advantages and disadvantages of different systems. What is interesting here is that for low levels of performance, Westmere’s power efficiency (ssj_ops/watt) is about the same as Nehalem’s. It isn’t till about 300K ssj_ops/watt (which is 40% utilization for Westmere and 50% for Nehalem) that a difference appears. That difference isn’t particularly substantial until closer to 500K ssj_ops, when Nehalem reaches close to peak performance, while Westmere is only modestly utilized.