Posts Tagged ‘Intel Nehalem-EP’

We’ve been challenged to backup our comparison of Nehalem-EP systems to Opteron Shanghai in price performance based on prevailing VMmark scores available on VMware’s site. In earlier posts, our analysis predicted “comparable” price-performance results between Shanghai and Nehalem-EP systems based on the economics of today’s memory and processors availability:

So what we’ve done here is taken the on-line configurations of some of the benchmark competitors. To make things very simple, we’ve just configured memory and CPU as tested – no HBA or 10GE cards to skew the results. The only exception – as pointed out by our challenger – is that we’ve taken the option of using “street price” memory where “street price” is better than the server manufacturer’s memory price.

Here’s our line-up:

System

Processor

Qty.

Speed (GHz)

Speed (GHz, Opt)

Memory Configuration

Street Price

Inspur NF5280

X5570

2

2.93

3.2

96GB (12x8GB) DDR3 1066

$18,668.58

Dell PowerEdge R710

X5570

2

2.93

3.2

96GB (12x8GB) DDR3 1066

$16,893.00

IBM System x 3650M2

X5570

2

2.93

3.2

96GB (12x8GB) DDR3 1066

$21,546.00

Dell PowerEdge M610

X5570

2

2.93

3.2

96GB (12x8GB) DDR3 1066

$21,561.00

HP ProLiant DL370 G6

W5580

2

3.2

3.2

96GB (12x8GB) DDR3 1066

$18,636.00

Dell PowerEdge R710

X5570

2

2.93

3.2

96GB (12x8GB) DDR3 1066

$16,893.00

Dell PowerEdge R805

2384

2

2.7

2.7

64GB (8x8GB) DDR2 533

$6,955.00

Dell PowerEdge R905

8384

4

2.7

2.7

128GB (16x8GB) DDR2 667

$11,385.00

Here we see Dell offering very aggressive DDR3/1066 pricing [for the R710] allowing us to go with on-line configurations, and HP offering overly expensive DDR2/667 memory prices (factor of 2) forcing us to go with 3rd party memory. In fact, IBM did not allow us to configure their memory configuration – as tested [with the 3650M2] – with their on-line configuration tool [neither did Dell with the M610] so we had to apply street memory prices. [Note: the So here’s how they rank with respect to VMmark:

As you can easily see, the cost-per-tile (analogous to $/VM) favors the Shanghai systems. In fact, the one system that we’ve taken criticism for including in our previous comparisons – the Supermicro 6026T-NTR+ with 72GB of DDR3/1066 (running at DDR3/800) – actually leads the pack in Nehalem-EP $/tile, but we’ve excluded it from our tables since it has been argued to be a “sub-optimal” configuration and out-lier. Again, the sweet spot for price-performance for Nehalem, Shanghai and Istanbul is in the 48GB to 80GB range with inexpensive memory: simple economics.

Please note, that not one of the 2P VMmark scores listed on VMware’s official VMmark results tally carry the Opteron 2393SE version of the processor (3.1GHz) or HT3-enabled motherboards. It is likely that we’ll not see HT3-enabled scores nor 2P ESX 4.0 scores until Istanbul’s release in the coming month. Again, if Shanghai’s $/tile is competitive with Nehalem’s today (again, in the 48GB to 80GB configurations), Istanbul – with the same memory and system costs – will be even more so.

Update: AMD’s Margaret Lewis has a similar take with comparison prices for AMD using DDR2/533 configurations. Her numbers – like our previous posts – resolve to $/VM, however she provides some good “street prices” for more “mainstream” configurations of Intel Nehalem-EP and AMD Shanghai systems. See her results and conclusions on AMD’s blog.

Let’s look at some more real world applications of what we’ve learned from the VMmark results for Nehalem and what it means in a practical comparison. We’ll award Nehalem-EP’s SMT a 25% bonus for in our comparisons when vCPU/core count is taken into the measurement. In a 6:1 consolidation, this means 60 vCPU’s for 2P Nehalem and 48 vCPU’s for Shanghai. Using this bias, the following cost characteristics are revealed for VM’s with average memory footprints of 1.5GB, for the Nehalem-EP 3.2GHz system:

Nehalem-EP Configuration

Street $

1536MB VM’s, 1 vCPU’s

Max vCPU’s (6/c)

Cost/VM

2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 24GB DDR3/1333

$7,017.69

13

60

$539.82

2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 48GB DDR3/1066

$7,755.99

28

60

$277.00

2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 72GB DDR3/800

$8,708.19

42

60

$207.34

2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 96GB DDR3/1066

$21,969.99

57

60

$385.44

2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 144GB DDR3/800

$30,029.19

60

60

$500.49

2 x 2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 144GB DDR3/800

$60,058.38

120

120

$500.49

We’ll compare this to a Shanghai 2P system at 3.1GHz vs. the Nehalem-EP system:

When I see benchmarks like these quoted by AnandTech I start to wonder why they consider the results “analytical…” In any case, there are significant ramifications to larger memory pools and higher clock speeds in VMmark, and these results show that fact. Additionally, the results also seem to indicate:

VMware vSphere (ESX v4.0) takes serious advantage of the new hyperthreading in Nehalem-EP

Nehalem-EP’s TurboBoost Appears to render the value proposition in favor of the X5570 over the W5580, all things considered

Judging from the Supermicro VMmark score, the Nehalem-EP (adjusted for differences in processor speed) turns-in about a 6% performance advantage over the Shanghai with comparable memory footprints. Had the Opteron been given additional memory, perhaps the tile and benchmark scores would have better illustrated this conclusion. It is unclear whether or not vSphere is significantly more efficient at resource scheduling, but the results seem to indicate that – at least with Nehalem’s new hyperthreading – it is more efficient.

Platform

Memory

VMware Version

VMmark Score

Rating
(Raw/
Clock Adj.)

Per Tile

HP ProLiant
385G5p
(2xOpteron 2384, 2.7GHz)

64GB DDR2/533

ESX v3.5.0 Update 3

11.28
@8 tiles

100%/
100%

100%

Supermicro
6026-NTR+(2xX5570, 2.93GHz w/3.2GHz TurboBoost)

72GB DDR3/1066

ESX v3.5.0 Update 4 BETA

14.22
@10 tiles

126%/
106%

101%

Dell PowerEdge
M610
(2xX5570, 2.93GHz w/3.3GHz TurboBoost)

96GB DDR3/1066

ESX v4.0

23.90
@17tiles

212%/
174%

100%

HP ProLiant
DL370 G6
(2xW5580, 3.2GHz w/3.3GHz TurboBoost)

96GB DDR3/1066

ESX v4.0

23.96
@16tiles

213%/
172%

106%

HP ProLiant
DL585 G5 (4x8386SE, 2.8GHz)

128GB DDR2/667

ESX v3.5.0 Update 3

20.43
@14 tiles

181%/
174%

104%

HP ProLiant
DL585 G5 (4x8393SE, 3.1GHz)

128GB DDR2/667

ESX v4.0

22.11
@15 tiles

196%/
171%

105%

One things is clear from these VMmark examples: Nehalem-EP is a huge step in the right direction for Intel, and it potentially blurs the line between 2P and 4P systems. AMD will not have much breathing room with Istanbul in the 2P space against Nehalem-EP for system refreshes unless it can show similar gains and scalability. Where Istanbul will shine is in its drop-in capability in existing 2P, 4P and 8P platforms.

SOLORI’s take: These are exciting times for those just getting into virtualization. VMmark would seem to indicate that consolidation factors unlocked by Nehalem-EP come close to rivaling 4P platforms at about 75% of the cost. If I were buying a new system today, I would be hard-pressed to ignore Nehalem as a basis for my Eco-system. However, the socket-F Opteron systems still has about 8-12 months of competitive life in it, at which point it becomes just another workhorse. Nehalem-EP still does not provide enough incentive to shatter an established Eco-system.

SOLORI’s 2nd take: AMD has a lot of ground to cover with Istanbul and Magny-Cours in the few short months that remain in 2009. The “hearts and minds” of system refresh and new entrants into virtualization are at stake and Nehalem-EP offers some conclusive value to those entering the market.

With entrenched customers, AMD needs to avoid making them feel “left behind” before the market shifts definitively. AMD could do worse than getting some SR5690-based Istanbul platforms out on the VMmark circuit – especially with its HP and Supermicro partners. We’d also like to see some Magny-Cours VMmarks prior to the general availability of the G34 systems.

Virtualization now reaches an I/O barrier where consolidated applications must vie for increasingly more limited I/O resources. Early virtualization techniques – both software and hardware assisted – concentrated on process isolation and gross context switching to accelerate the “bulk” of the virtualization process: running multiple virtual machines without significant processing degradation.

As consolidation potentials are greatly enhanced by new processors with many more execution contexts (threads and cores) the limitations imposed on I/O – software translation and emulation of device communication – begin to degrade performance. This degradation further limits consolidation, especially where significant network traffic (over 3Gbps of non-storage VM traffic per virtual server) or specialized device access comes into play.

I/O Virtualization – The Next Step-Up

Intrinsic to AMD-V in revision “F” Opterons and newer AM2 processors is I/O virtualization enabling hardware assisted memory management in the form of a Graphics Aperture Remapping Table (GART) and the Device Exclusion Vector (DEV). These two facilities provide address translation of I/O device access to a limited range of the system physical address space and provide limited I/O device classification and memory protection.

Combined with specialized software GART and DEV provided primitive I/O virtualization but were limited to the confines of the memory map. Direct interaction with devices and virtualization of device contexts in hardware are efficiently possible in this approach as VMs need to rely on hypervisor control of device access. AMD defined its I/O virtualization strategy as AMD IOMMU in 2006 (now AMD-Vi) and has continued to improve it through 2009.

With the release of new motherboard chipsets (AMD SR5690) in 2009, significant performance gains in I/O will be brought to the platform with end-to-end I/O virtualization. Motherboard refreshes based on the SR5690 should enable Shanghai and Istanbul processors to take advantage of the full AMD IOMMU specification (now AMD-Vi).

Similarly, Intel’s VT-d approach combines chipset and CPU features to solve the problem in much the same way. Due to the architectural separation of memory controller from CPU, this meant earlier processors not only carry the additional instruction enhancements but they must also be coupled to northbridge chipsets that contained support. This feature was initially available in the Intel Q35 desktop chipset in Q3/2007. Read the rest of this entry ?

Popular Posts

In Medio Stat Veritas

SOLORI's Take and Quick Take posts express my personal opinion unless explicitly attributed to other sources. Where possible, supporting facts are presented to properly frame and ground these opinions, however they are presented "AS-IS" without regard to warranty or promise: expressed or implied.

Comments are open to all registered users and may be edited for decorum. Spam is deleted with prejudice.