SOLORI’s Take: The most interesting aspect of the EX benchmark is its clock-adjusted scaling factor: between 70% and 91% versus a 2P/8-core Nehalem-EP reference (Cisco UCS, B200 M1, 25.06@17 tiles). The unpredictable nature of Intel’s “turbo” feature – varying with thermal loads and per-core conditions – makes an exact clock-for-clock comparison difficult. However, if the scaling factor is 90%, the EX blows away our previous expectations about the platform’s scalability. Where did we go wrong when we predicted a conservative 44@39 tiles? We’re looking at three things: (1) a bad assumption about the effectiveness of “turbo” in the EP VMmark case (setting Ref_EP_Clock to 3.33 GHz), and (2) underestimating EX’s scaling efficiency (assumed 70%), (3) assuming a 2.26GHz clock for EX.

Correcting for the as-tested clock/turbo numbers, and using AMD’s 2P-to-4P VMmark scaling efficiency of 83%, and shifting to the new UCS baseline (with newer ESX version) the Nehalem-EX prediction factors to:

Clearly, this approach either overestimates the scaling efficiency or underestimates the “turbo” mode. IBM claims that a 2.93 GHz “turbo” setting is viable where Intel suggests 2.67 GHz is the maximum, so there is a source of potential bias. Looking at the tiles-per-core ratio of the VMmark result, the Nehalem-EX drops from 2.13 tiles per core on EP/2P platforms to 1.5 tiles per core on EX/4P platforms – about a 30% drop in per-core loading efficiency. That indicator matches well with our initial 75% scaling efficiency moving from 2P to 4P – something that AMD demonstrated with Istanbul last August. Given the high TDP of EX and IBM’s 2.93 GHz “turbo” specification, it’s possible that “turbo” is adding clock cycles (and power consumption) and compensating for a “lower” scaling efficiency than we’ve assumed. Looking at the same estimation with 2.93GHz “clock” and 71% efficiency (1.5/2.13), the numbers fall in line with VMmark:

This give us a good basis for evaluating 2P vs. 4P Nehalem systems: scaling factor of 71% and capable of pushing clock towards the 3GHz mark within its thermal envelope. Both of these conclusions fit typical 2P-to-4P norms and Intel’s process history.

That’s nowhere near good enough to top the current 8P, 48-core Istanbul VMmark at 53.73@35 tiles, so we’ll likely have to wait for faster 6100 parts to see any new AMD records. However, assuming AMD’s proposition is still “value 4P” so about 200 VM’s at under $18K/server gets you around $90/VM or less.

We’ve been challenged to backup our comparison of Nehalem-EP systems to Opteron Shanghai in price performance based on prevailing VMmark scores available on VMware’s site. In earlier posts, our analysis predicted “comparable” price-performance results between Shanghai and Nehalem-EP systems based on the economics of today’s memory and processors availability:

So what we’ve done here is taken the on-line configurations of some of the benchmark competitors. To make things very simple, we’ve just configured memory and CPU as tested – no HBA or 10GE cards to skew the results. The only exception – as pointed out by our challenger – is that we’ve taken the option of using “street price” memory where “street price” is better than the server manufacturer’s memory price.

Here’s our line-up:

System

Processor

Qty.

Speed (GHz)

Speed (GHz, Opt)

Memory Configuration

Street Price

Inspur NF5280

X5570

2

2.93

3.2

96GB (12x8GB) DDR3 1066

$18,668.58

Dell PowerEdge R710

X5570

2

2.93

3.2

96GB (12x8GB) DDR3 1066

$16,893.00

IBM System x 3650M2

X5570

2

2.93

3.2

96GB (12x8GB) DDR3 1066

$21,546.00

Dell PowerEdge M610

X5570

2

2.93

3.2

96GB (12x8GB) DDR3 1066

$21,561.00

HP ProLiant DL370 G6

W5580

2

3.2

3.2

96GB (12x8GB) DDR3 1066

$18,636.00

Dell PowerEdge R710

X5570

2

2.93

3.2

96GB (12x8GB) DDR3 1066

$16,893.00

Dell PowerEdge R805

2384

2

2.7

2.7

64GB (8x8GB) DDR2 533

$6,955.00

Dell PowerEdge R905

8384

4

2.7

2.7

128GB (16x8GB) DDR2 667

$11,385.00

Here we see Dell offering very aggressive DDR3/1066 pricing [for the R710] allowing us to go with on-line configurations, and HP offering overly expensive DDR2/667 memory prices (factor of 2) forcing us to go with 3rd party memory. In fact, IBM did not allow us to configure their memory configuration – as tested [with the 3650M2] – with their on-line configuration tool [neither did Dell with the M610] so we had to apply street memory prices. [Note: the So here’s how they rank with respect to VMmark:

As you can easily see, the cost-per-tile (analogous to $/VM) favors the Shanghai systems. In fact, the one system that we’ve taken criticism for including in our previous comparisons – the Supermicro 6026T-NTR+ with 72GB of DDR3/1066 (running at DDR3/800) – actually leads the pack in Nehalem-EP $/tile, but we’ve excluded it from our tables since it has been argued to be a “sub-optimal” configuration and out-lier. Again, the sweet spot for price-performance for Nehalem, Shanghai and Istanbul is in the 48GB to 80GB range with inexpensive memory: simple economics.

Please note, that not one of the 2P VMmark scores listed on VMware’s official VMmark results tally carry the Opteron 2393SE version of the processor (3.1GHz) or HT3-enabled motherboards. It is likely that we’ll not see HT3-enabled scores nor 2P ESX 4.0 scores until Istanbul’s release in the coming month. Again, if Shanghai’s $/tile is competitive with Nehalem’s today (again, in the 48GB to 80GB configurations), Istanbul – with the same memory and system costs – will be even more so.

Update: AMD’s Margaret Lewis has a similar take with comparison prices for AMD using DDR2/533 configurations. Her numbers – like our previous posts – resolve to $/VM, however she provides some good “street prices” for more “mainstream” configurations of Intel Nehalem-EP and AMD Shanghai systems. See her results and conclusions on AMD’s blog.

Let’s look at some more real world applications of what we’ve learned from the VMmark results for Nehalem and what it means in a practical comparison. We’ll award Nehalem-EP’s SMT a 25% bonus for in our comparisons when vCPU/core count is taken into the measurement. In a 6:1 consolidation, this means 60 vCPU’s for 2P Nehalem and 48 vCPU’s for Shanghai. Using this bias, the following cost characteristics are revealed for VM’s with average memory footprints of 1.5GB, for the Nehalem-EP 3.2GHz system:

Nehalem-EP Configuration

Street $

1536MB VM’s, 1 vCPU’s

Max vCPU’s (6/c)

Cost/VM

2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 24GB DDR3/1333

$7,017.69

13

60

$539.82

2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 48GB DDR3/1066

$7,755.99

28

60

$277.00

2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 72GB DDR3/800

$8,708.19

42

60

$207.34

2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 96GB DDR3/1066

$21,969.99

57

60

$385.44

2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 144GB DDR3/800

$30,029.19

60

60

$500.49

2 x 2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 144GB DDR3/800

$60,058.38

120

120

$500.49

We’ll compare this to a Shanghai 2P system at 3.1GHz vs. the Nehalem-EP system:

Popular Posts

In Medio Stat Veritas

SOLORI's Take and Quick Take posts express my personal opinion unless explicitly attributed to other sources. Where possible, supporting facts are presented to properly frame and ground these opinions, however they are presented "AS-IS" without regard to warranty or promise: expressed or implied.

Comments are open to all registered users and may be edited for decorum. Spam is deleted with prejudice.