Posts Tagged ‘6-core’

Looking at memory prices one last time before the year is out and prices of our “benchmark” Kingston DDR3 server DIMMs are on the decline. While the quad rank 8G DDR3/1066 DIMMs are below the $565 target price (at $514) we predicted back in August, the dual rank equivalent (on our benchmark list) are still hovering around $670 each. Likewise, while retail price on the 8G DDR2/667 parts continue to rise, inventory and promotional pricing has managed to keep them flat at $433 each, giving large foot print DDR2 systems a $2,000 price advantage (based on 64GB systems).

As the year ends, OEMs are expected to “pull up inventory,” according to DRAMeXchange, in advance of a predicted market short fall somewhere in Q2/2010. Demand for greater memory capacities are being driven by Windows 7 and 64-bit processors with 4GB as the well established minimum system foot print ending 2009. With Server 2008 systems demanding 6GB+ and increased shift towards large memory foot print virtualization servers and blades, the market price for DDR3 – just turning the corner in Q1/2010 versus DDR2 – will likely flatten based on growing demand.

In the battle to “feed” the virtualization servers of 2H/2010, the 4-channel “behemoth” Magny-Cours system could have a serious memory/price advantage with 8 (2-DPC) or 12 (3-DPC) configurations of 64GB (2.6GB/thread) and 96GB (3.9GB/thread) DDR3/1066 using only 4GB sticks (assumes 2P configuration). Similar GB/thread loads on Nehalem-EP6 “Gulftown” (6-core/12-thread) could be had with 72GB DDR3/800 (18x 4GB, 3-DPC) or 96GB DDR3/1066 (12x 8GB, 2-DPC), providing the solution architect with a choice between either a performance (memory bandwidth) or price (about $2,900 more) crunch. This means Magny-Cours could show a $2-3K price advantage (per system) versus Nehalem-EP6 in $/VM optimized VDI implementations.

Where the rubber starts to meet the road, from a virtualization context, is with (unannounced) Nehalem-EP8 (8-core/16-thread) which would need 96GB (12x 8GB, 2-DPC) to maintain 2.6GB/thread capacity with Magny-Cours. This creates a memory-based price differential – in Magny-Cours’ favor – of about $3K per system/blade in the 2P space. At the high-end (3.9GB/thread), the EP8 system would need a full 144GB (running DDR3/800 timing) to maintain GB/thread parity with 2P Magny-Cours – this creates a $5,700 system price differential and possibly a good reason why we’ll not actually see an 8-core/16-thread variant of Nehalem-EP in 2010.

Assuming that EP8 has 30% greater thread capacity than Magny-Cours (32-threads versus 24-threads, 2P system), a $5,700 difference in system price would require a 2P Magny-Cours system to cost about $19,000 just to make it an even value proposition. We’d be shocked to see a MC processor priced above $2,600/socket, making the target system price in the $8-9K range (24-core, 2P, 96GB DDR3/1066). That said, with VDI growth on the move, a 4GB/thread baseline is not unrealistic (4 VM/thread, 1GB per virtual desktop) given current best practices. If our numbers are conservative, that’s a $100 equipment cost per virtual desktop – about 20% less than today’s 2P equivalents in the VDI space. In retrospect, this realization makes VMware’s decision to license VDI per-concurrent-user and NOT per socket a very forward-thinking one!

Most importantly for virtualization systems architects is how the vCPU scheduling affects “measured” performance. The telling piece comes from the difference in comparison results where vCPU scheduling is equalized:

AnandTech's Quad Sockets v. Dual Sockets Comparison. Oct 6, 2009.

When comparing the results, De Gelas hits on the I/O factor which chiefly separates VMmark from vAPUS:

The result is that VMmark with its huge number of VMs per server (up to 102 VMs!) places a lot of stress on the I/O systems. The reason for the Intel Xeon X5570’s crushing VMmark results cannot be explained by the processor architecture alone. One possible explanation may be that the VMDq (multiple queues and offloading of the virtual switch to the hardware) implementation of the Intel NICs is better than the Broadcom NICs that are typically found in the AMD based servers.

This is yet another issue that VMware architects struggle with in complex deployments. The latency in “Dunnington” is a huge contributor to its downfall and why the Penryn architecture was a dead-end. Combined with 8 additional threads in the 2P form factor, Nehalem delivers twice the number of hardware execution contexts than Shanghai, resulting in significant efficiencies for Nehalem where small working data sets are involved.

When larger sets are used – as in vAPUS – the Istanbul’s additional cores allows it to close the gap to within the clock speed difference of Nehalem (about 12%). In contrast to VMmark which implies a 3:2 advantage to Nehalem, the vAPUS results suggest a closer performance gap in more aggressive virtualization use cases.

What does this mean for AMD and the only 6-core shipping today? Since Intel’s still projecting Q2/2010 for the server part, AMD has a decent opportunity to grow market share for Istanbul. Intel’s biggest rival will be itself – facing a wildly growing number of SKU’s in across its i-line from i5, i7, i8 and i9 “families” with multiple speed and feature variants. Clearly, the non-HT version would stand as a direct competitor to Istanbul’s native 6-core SKUs. Likewise, Istanbul may be no match for the 6-core Nehalem with HT and “turbo core” feature set.

However, with an 8-core “Beckton” Nehalem variant on the horizon, it might be hard to understand just where the Gulftown fits in Intel’s picture. Intel faces a serious production issue, filling fab capacity with 4-core, 6-core and 8-core processors, each with speed, power, socket and HT variants from which to supply high-speed, high-power SKUs and lower-speed, low-power SKUs for 1P, 2P and 4P+ destinations. Doing the simple math with 3 SKU’s per part Intel would be offering the market a minimum of 18 base parts according to their current marketing strategy: 9 with HT/turbo, 9 without HT/turbo. For socket LGA-1366, this could easily mean 40+ SKUs with 1xQPI and 2xQPI variants included (up from 23).

SOLORI’s take: Intel will have to create some interesting “crippling or pricing tricks” to keep Gulftown from canibalizing the Gainstown market. If they follow their “normal” play book, we prodict the next 10-months will play out like this:

Initially there will be no 8-core product for 1P and 2P systems (LGA-1366), allowing for artificially high margins on the 8-core EX chip (LGA-1567), slowing the enevitable canibalization of the 4-core/2P market, and easing production burdens;

Gulftown will remain high-power (90-130W TDP) and be positioned against AMD’s G34 systems and Magny-Cours – plotting 12-core against 12-thread;

Intel creates a “socket refresh” (LGA-1566?) to enable “inexpensive” 2P-4P platforms from its Gulftown/Beckton line-up in 2H/2010 (ostensibly to maintain parity with G34) without hurting EX;

Revised, lower-power variants of Gainstown will be positioned against AMD’s C32 target market;

Intel will cut SKUs in favor of higher margins, increasing speed and features for “same dollar” cost;

Non-HT parts will begin to disappear in 4-core configurations completely;

Intel’s AES enhancements in Gulftown will allow it to further differentiate itself in storage and security markets;

It would be a mistake for Intel to continue growing SKU count or provide too much overlap between 4-core HT and 6-core non-HT offerings. If purchasing trends soften in 4Q/09 and remain (relatively) flat through 2Q/10, Intel will benefit from a leaner, well differentiated line-up. AMD has already announced a “leaner” plan for G34/C32. If all goes well at the fabs, 1H/2010 will be a good ole fashioned street fight between blue and green.

HP has simultaneously achieved two near identical VMmark scores with their ProLiant DL585 G6 rack server and ProLiant BL685c G6 blade, claiming the summit from the reigning 24-core champion. Since first establishing the 24-core tier VMmark in September 2009, the Intel “Dunnington” 6-core processor (FSB architecture) has gone unchallenged. Now, with the release of the Opteron 8439SE raising the performance bar and the Opteron 8435 making a clear price-performance case, Dunnington’s vacation is over.

Today’s Istanbul-based achievements – established in the same memory footprint as the top Dunnington – renders the venerable processor all but obsolete, besting the champ by 4 tiles (24 more virtual machines) with a score-tile ratio of 1.5 for the rack system and 1.46 (same as the Dunnington at 14 tiles) for the blade. Using the HP and IBM on-line configuration tools, we established the retail (on-line) price for each system – down to the Fiber Channel HBA’s – and compared them for $/VM value. Here are the results:

The results indicate a 21-38% savings per-VM for Istanbul over Dunnington in the 4P/24-core virtualization space. This is bread-and-butter territory for VDI implementations and SQL virtualizations, and Intel’s last remaining market place for the Dunnington processor. With the top-bin Istanbul weighing-in with 3% better performance, 18% less power consumption and 30% more capacity against Dunnington at the same price point, Intel’s 4P gambit is played-out and Nehalem-EX cannot arrive too soon for Intel.

It is worth asking the question: does the HP ProLiant 4P/24-core offer the best value? The answer depends on the value proposition. From a straight $/VM vantage point, the HP DL385 G6 comparison demonstrated a more economical $182/VM – a difference of $40/VM lower than the BL685c G6 – so the 2P rack system still comes out on top for the absolute bottom-line concious. However, for applications like SQL consolidations, the additional savings in licensing on 4P platforms versus 2P platforms dwarfs this differential.

What is clear: AMD’s Istanbul solution will remain unchallenged in the 4P space both in raw performance and in price-performance until Nehalem-EX is delivered. That means if Nehalem-EX does not arrive in Q3/2009, the market will likely wait for Q1/2010 to make any long-term purchasing decisions in anticipation of the new platforms slated to break-in the new year.

June 1, 2009 – Today, AMD is announcing the general availability of its new single-die, 6-core Opteron processor code named “Istanbul.” We have weighed-in on the promised benefits of Istanbul based on pre-release material that was not under non-disclosure protections. Now, we’re able to disclose the rest of the story.

First, we got a chance to talk to Mike Goddard, AMD Server Products CTO, to discuss Istanbul and how G34/C32 platforms are shaping-up. According to Goddard,”things went really well with Istanbul; it’s no big secret that the silicon we’re using in Istanbul is the same silicon we’re using in Magny-Cours.” Needless to say, there are many more forward-thinking capabilities in Istanbul than can be supported in Socket-F’s legacy chipsets.

“We had always been planning a refresh to Socket-F with 5690,” says Goddard, “but Istanbul got pulled-in beyond our ability to pull-in the chipset.” Consequently, while there could be Socket-F platforms based on the next-generation 5690/5100 chipset, Goddard suggests that “most OEM’s will realign their platform development around [G34/C32, Q1/2010].”

In common parlance, Istanbul is a “genie in a bottle,” and we won’t see its true potential until it resurfaces in its Magny-Cours/G34 configuration. However, at few of these next-generation tweaks will trickle-down to Socket-F systems:

AMD PowerCap Manager (via BIOS extensions)

Enhanced AMD PowerNow! Technology

AMD CoolCore Technology extended to L3 cache

HT Assist (aka probe filter) for increase memory bandwidth

HT 3.0 with increase to 4.8GT/sec and IMC improvements

5 new part SKUs

Better 2P Performance Parity with Nehalem-EP

That’s in addition to 50% more cores in the same power envelope: not an insignificant improvement. In side-by-side comparisons to “Shanghai” quad-core at the same clock frequency, Istanbul delivers 2W lower idle power and 34% better SPECpower ssj_2008 (1,297 overall) results using identical systems with just a processor swap. In fact, the only time Istanbul exceeded Shanghai’s average power envelope was at 80% actual load and beyond – remaining within 5% of the Shanghai even at 100% load. Read the rest of this entry ?

AMD was gracious enough to invite us to their Reviewer’s Day on May 20th to have a final look at “Istanbul” and discuss their plans for the product’s upcoming release. While much of the information we received is embargoed until the June, 2009 release date, we can tell you that we’ve have received a couple of AMD’s new 6-core “Istanbul” Opterons for testing and review. We’ll look forward to seeing “Istanbul” in action inside our lab over the next couple of weeks. Our verdict will be available at launch.

Instead of typical benchmarks, we’ll be focusing on Istanbul’s implications for vSphere before the new Opteron hits the streets (remember 6-core is the limit for “free” and “reduced capability” vSphere license). If what we saw from AMD’s internal testing at Reviewer’s Day is accurate , then our AMD/VMware Eco-System partners are going to be very happy with the results. What we can confirm today is that AGESA 3.3.0.3+ 3.5.0.0+ is required to run Istanbul, so start looking for BIOS updates from your vendors as the launch date approaches. The systems we reported on from Tyan back in April will be good-to-go at launch (our GT28 test systems are already running it require a beta BIOS).

SOLORI’s take: We made a somewhat bold prediction on April 30, 2009 that “Shanghai-Istanbul Eco-System looks like an economic stimulus all its own” when comparing the AMD upgrade path to Intel’s (rip and replace) where VMware infrastructures are concerned. That article, Shanghai Economics 101, was one of our most popular AMD-related postings yet, and – judging from what we’ve seen already – it looks like we may have been correct!

While we’re impressed with the ability to flawlessly vMotion from socket 940 to socket-F, we were more impressed with the ability to insert an Istanbul into a Barcelona or Shanghai system and immediately realize the benefits. We’re going to look at our review samples, revisit our price-performance data and Watt/VM calculations before making sweeping recommendation. However, we expect to find Istanbul to be a very good match to on-premise cloud/virtualization initiatives.

SOLORI’s 2nd take: VDI and databased consolidation systems running on 4P AMD boxes are about to take a giant leap forward. We can’t wait to see 24-core and 48-core VMmark scores updated over the next two months. Start asking your system vendor for updated BIOS supporting AGESA 3.5.0.0+ (Tyan are you listening? Supermicro’s AS2041M is already there), and get your 4P test mule updated and prepare to be amazed…

AMD released an updated technology road-map for it’s Opteron processor family, beginning with the early availability of Istanbul – its Socket-F compatible 6-core processor – shipping for revenue in May and available from OEM’s in June. This information was delivered in a webcast today.

AMD Istanbul 6-core Processor

“…up to 30 percent more performance within the same power envelope and on the same platform as current Quad-Core AMD Opteron…”

Additionally, AMD updated the availability of its Direct Connect Architecture 2.0 to be available only in the Opteron 4000 and 6000 series (socket C32 and G34, respectively). Companies waiting for the 12-core “Magny-Cours” processor will have to switch to the G34 platform in 2010. AMD announced that it is already shipping this 45nm part to sampling partners, and some customers will receive parts in 2H/2009. Magny-Cours is expected to be available from OEM’s and system vendors in 1H/2010.

“Opteron 4000 series is also planned for introduction in 2010 for 1P and 2P servers and designed to address virtualized Web and cloud computing environments. The 4000 series will launch with 4- and 6-core processors…”

AMD believes, with core counts on the rise, dense computing (HPC and data center virtualization or cloud) will rely on the 4000 series and its more “green friendly” low power parts called “EE” offering comparable performance at 40W average power. This will create a differential in the server space between 4000 and 6000 (much like 2000 and 8000 today) but with overlap in the 2P market (unlike 2000/8000). The 6000 series is envisioned as a “high performance computing” part where power sensitivity is not the major concern. Read the rest of this entry ?

In Medio Stat Veritas

SOLORI's Take and Quick Take posts express my personal opinion unless explicitly attributed to other sources. Where possible, supporting facts are presented to properly frame and ground these opinions, however they are presented "AS-IS" without regard to warranty or promise: expressed or implied.

Comments are open to all registered users and may be edited for decorum. Spam is deleted with prejudice.