AMD’s John Fruehe outlines AMD’s market approach for the new chipsets in his “AMD at Work” blog today. Based on the same basic logic/silicon, the SR5690, SR5670 and SR5650 all deliver PCI-E 2.0 and HT3.0 but at differing levels of power consumption and PCI Express lanes to their respective platforms. Paired with appropriate “power and speed” Opteron variant, these platforms offer system designers, virtualization architects and HPC vendors greater control over price-performance and power-performance constraints that drive their respective environments.

Using the same 8-processor HP ProLiant DL785 G6 platform as in the previous run – complete with 2.8GHz AMD Opteron 8439 SE 6-core chips and 256GB DDR2/667 – the new score comes with significant performance bumps in the javaserver, mailserver and database results achieved by the same system configuration as the previous attempt – including the same ESX 4.0 version (164009). So what changed to add an additional 5 tiles to the team’s run? It would appear that someone was unsatisfied with the storage configuration on the mailserver run.

Given that the tile ratio of the previous run ran about 6% higher than its 24-core counterpart, there may have been a small indication that untapped capacity was available. According to the run notes, the only reported changes to the test configuration – aside from the addition of the 5 LUNs and 5 clients needed to support the 5 additional tiles – was a notation indicating that the “data drive and backup drive for all mailserver VMs” we repartitioned using AutoPart v1.6.

The change in performance numbers effectively reduces the virtualization cost of the system by 15% to about $257/VM – closing-in on its 24-core sibling to within $10/VM and stretching-out its lead over “Dunnington” rivals to about $85/VM. While virtualization is not the primary application for 8P systems, this demonstrates that 48-core virtualization is definitely viable.

SOLORI’s Take: HP’s performance team has done a great job tuning its flagship AMD platform, demonstrating that platform performance is not just related to hertz or core-count but requires balanced tuning and performance all around. This improvement in system tuning demonstrates an 18% increase in incremental scalability – approaching within 3% of the 12-core to 24-core scaling factor, making it actually a viable consideration in the virtualization use case.

In recent discussions with AMD about the SR5690 chipset applications for Socket-F, AMD re-iterated that the mainstream focus for SR5690 has been Magny-Cours and the Q1/2010 launch. Given the close relationship between Istanbul and Magny-Cours – detailed nicely by Charlie Demerjian at Semi-Accurate – the bar is clearly fixed for 2P and 4P virtualization systems designed around these chips. Extrapolating from the similarities and improvements to I/O and memory bandwidth, we expect to see 2P VMmarks besting 32@23 and 4P scores over 54@39 from HP, AMD and Magny-Cours.

SOLORI’s 2nd Take: Intel has been plugging away with its Nehalem-EX for 8-way systems and – delivering 128-threads – promises to deliver some insane VMmarks. Assuming Intel’s EX scales as efficiently as AMD’s new Opterons have, extrapolations indicate performance for the 4P, 64-thread Nehalem-EX shoud fall between 41@29 and 44@31 given the current crop of speed and performance bins. Using the same methods, our calculus predicts an 8P, 128-thread EX system should deliver scores between 64@45 and 74@52.

With EX expected to clock at 2.66GHz with 140W TDP and AMD’s MCM-based Magny-Cours doing well to hit 130W ACP in the same speed bins, CIO’s balancing power and performance considerations will need to break-out the spreadsheets to determine the winners here. With both systems running 4-channel DDR3, there will be no power or price advantage given on either side to memory differences: relative price-performance and power consumption of the CPU’s will be major factors. Assuming our extrapolations are correct, we’re looking at a slight edge to AMD in performance-per-watt in the 2P segment, and a significant advantage in the 4P segment.

SOLORI’s Take: While the September timing of the release might imply a G6 with AMD’s SR5690 and IOMMU, we’re doubtful that the timing is anything but a coincidence: even though such a pairing would enable PCIe 2.0 and highly effective 10Gbps solutions. The modular design of the DL785 series – with its ability to scale from 4P to 8P in the same system – mitigates the economic realities of the dwindling 8P segment, and HP has delivered the pinnacle of performance for this technology.

We are also impressed with HP’s performance team and their ability to scale Shanghai to Istanbul with relative efficiency. Moving from DL785 G5 quad-core to DL785 G6 six-core was an almost perfect linear increase in capacity (95% of theoretical increase from 32-core to 48-core) while performance-per-tile increased by 6%. This further demonstrates the “home run” AMD has hit with Istanbul and underscores the excellent value proposition of Socket-F systems over the last several years.

Unfortunately, while they demonstrate a 91% scaling efficiency from 12-core to 24-core, HP and Istanbul have only achieved a 75% incremental scaling efficiency from 24-cores to 48-cores. When looking at tile-per-core scaling using the 8-core, 2P system as a baseline (1:1 tile-to-core ratio), 2P, 4P and 8P Istanbul deliver 91%, 83% and 62.5% efficiencies overall, respectively. However, compared to the %58 and 50% tile-to-core efficiencies of Dunnington 4P and 8P, respectively, Istanbul clearly dominates the 4P and 8P performance and price-performance landscape in 2009.

In today’s age of virtualization-driven scale-out, SOLORI’s calculus indicates that multi-socket solutions that deliver a tile-to-core ratio of less than 75% will not succeed (economically) in the virtualization use case in 2010, regardless of socket count. That said – even at a 2:3 tile-to-core ratio – the 8P, 48-core Istanbul will likely reign supreme as the VMmark heavy-weight champion of 2009.

SOLORI’s 2nd Take: HP and AMD’s achievements with this Istanbul system should be recognized before we usher-in the next wave of technology like Magny-Cours and Socket G34. While the DL785 G6 is not a game changer, its footnote in computing history may well be as a preview of what we can expect to see out of Magny-Cours in 2H/2010. If 12-core, 4P system price shrinks with the socket count we could be looking at a $150/VM price-point for a 4P system: now that would be a serious game changer.

In the current server-class arms race, Intel and AMD have secured separate quarters: Intel’s rival QPI architecture coupled to a 3-channel DDR3 memory bus and functional hyper-threading cores (top bin parts) holds the pure performance sector; while AMD’s improved Istanbul cores can be delivered 6 at a time and paired with inexpensive DDR2 memory to achieve better price-performance (acquisition). Both solutions deliver about the same economies in power consumption under virtualized loads.

All in all, the Twin2 with Xeon L5520 CPUs is the best platform for those seeking an affordable server with an excellent performance/watt ratio at an affordable price. On the other hand, if performance/price is the most important criterion followed by performance/watt, we would probably opt for the six-core Opteron version of the Twin2. Supermicro has “a blade killer” avialable with the Twin², especially for those people who like to keep the hardware costs low.

At the same time, DDR3 prices continue to inch up, by 5% in July, while DDR2 prices have appeared to bottom-out. This trend in DDR3 pricing is consistent across all speed ratings (1066/1333/1600) and, despite artificial downward price pressure from Samsung, has managed to drift upward 20% since May, 2009.

DDR3 Price Trend, May to August, 2009

Because low-end, lower-priced 2GB DDR3/1066 ($60/stick) memory shows little advantage over 2GB DDR2/800 ($35/stick), the 70% price premium keeps DDR2 in demand. With the added economic pressures of the world economy and cautious growth outlook of manufacturing sector, the cross-over from DDR2 to DDR3 will come at a significant cost: either to the consumer or the supplier.

Until the cross-over, DDR2-based systems will continue to be a favorite in price sensitive applications (i.e. where total system cost plays a significant role in purchasing decisions.) As an example of this economic inequality, let’s take the HP DL380 G6 and DL385 G6 as a comparison. Adding 16GB to the DL380 adds about $760 to the price tag (4x4GB DDR3-1066), while adding the same amount of memory to the DL385 adds only $410 (4x4GB DDR2-800). This comparison demonstrates an 85% price premium of DDR3 versus DDR2, a bit higher (percentage wise) than the desktop norm of 70%.

SOLORI’s Take: While the cost of memory in desktop systems typically represents a small portion of the overall system cost, the same can not be said for virtualization systems where entry configurations weigh-in at 16GB and often run from 48GB to 72GB in “fully loaded” systems. This, as our calculus has shown, is where the sweet-spot of $/VM is delivered.

SOLORI’s 2nd Take: We’re hoping to see Tyan and Supermicro release SR5690 chipset-based systems – promised in Q3/2009 – to take advantage of this pricing trend and round-out the Istanbul offering before Q1/2010 ushers-in the next wave of multi-core systems. With 10G prices on the decline, we think today’s virtualization applications make Istanbul+IOMMU a good price-performance and price-feature fit in the 32-64GB memory footprint space, leaving Nehalem-EP with only the performance niche to its credit. The only question is: where is SR5690?

Thanks to a tweet from @ErikBussink and the quick thinking of Charlie Demerjian at SemiAccurate we’ve been treated to a picture of the upcoming Tyan S8212 (2-way) based on AMD’s new line-up of motherboard chip sets. While we see a x16 and 3 x8 PCIe slots, 6 SATA and 8 SAS ports, there is (conspicuously) no 10GE LOM – just 1GE.

June 1, 2009 – Today, AMD is announcing the general availability of its new single-die, 6-core Opteron processor code named “Istanbul.” We have weighed-in on the promised benefits of Istanbul based on pre-release material that was not under non-disclosure protections. Now, we’re able to disclose the rest of the story.

First, we got a chance to talk to Mike Goddard, AMD Server Products CTO, to discuss Istanbul and how G34/C32 platforms are shaping-up. According to Goddard,”things went really well with Istanbul; it’s no big secret that the silicon we’re using in Istanbul is the same silicon we’re using in Magny-Cours.” Needless to say, there are many more forward-thinking capabilities in Istanbul than can be supported in Socket-F’s legacy chipsets.

“We had always been planning a refresh to Socket-F with 5690,” says Goddard, “but Istanbul got pulled-in beyond our ability to pull-in the chipset.” Consequently, while there could be Socket-F platforms based on the next-generation 5690/5100 chipset, Goddard suggests that “most OEM’s will realign their platform development around [G34/C32, Q1/2010].”

In common parlance, Istanbul is a “genie in a bottle,” and we won’t see its true potential until it resurfaces in its Magny-Cours/G34 configuration. However, at few of these next-generation tweaks will trickle-down to Socket-F systems:

AMD PowerCap Manager (via BIOS extensions)

Enhanced AMD PowerNow! Technology

AMD CoolCore Technology extended to L3 cache

HT Assist (aka probe filter) for increase memory bandwidth

HT 3.0 with increase to 4.8GT/sec and IMC improvements

5 new part SKUs

Better 2P Performance Parity with Nehalem-EP

That’s in addition to 50% more cores in the same power envelope: not an insignificant improvement. In side-by-side comparisons to “Shanghai” quad-core at the same clock frequency, Istanbul delivers 2W lower idle power and 34% better SPECpower ssj_2008 (1,297 overall) results using identical systems with just a processor swap. In fact, the only time Istanbul exceeded Shanghai’s average power envelope was at 80% actual load and beyond – remaining within 5% of the Shanghai even at 100% load. Read the rest of this entry ?

In Medio Stat Veritas

SOLORI's Take and Quick Take posts express my personal opinion unless explicitly attributed to other sources. Where possible, supporting facts are presented to properly frame and ground these opinions, however they are presented "AS-IS" without regard to warranty or promise: expressed or implied.

Comments are open to all registered users and may be edited for decorum. Spam is deleted with prejudice.