Challenging. That is the least you can say about the economic climate for the launch of Intel's newest "Nehalem EP Xeon" platform. However, challenges must be met and they certainly make things more interesting. The server vendors won't convince a lot of people to buy a new Intel Nehalem (or AMD Shanghai) based server just because "performance is higher". That will only work in the processing hungry HPC and render worlds, where less time per task results in time and cost savings. Hence, the challenge for AMD and Intel is to convince the rest of the market - that is 95% or so - that the new platforms provide a compelling ROI (Return On Investment).

The most productive or intensively used servers in general get replaced every 3 to 5 years. Based on Intel's own inquiries, Intel estimates that the current installed base consists of 40% dual-core CPU servers and 40% servers with single-core CPUs.

That means that Intel's Nehalem platform (and AMD's Shanghai/Opteron 23xx platform) has to convince people to replace their dual-core Opteron, dual-core Xeon 50xx ("Dempsey"), and Xeon "Irwindale" servers. There are two great ways to turn a much more powerful server into a moneymaking and cost saving machine. One is to use fewer servers in a cluster, which is not applicable to all companies. The other more popular approach is to consolidate more servers on the same physical machine by using virtualization. The most important arguments for upgrading your servers are performance/watt and support for virtualization.

Intel's newest platform holds the promise that it supports virtualization better by adding EPT and lower world switch times. However, probably the largest bottleneck in the past was the amount of available bandwidth. Bandwidth is frequently an overrated performance factor, as few applications - excluding the HPC world - get a boost from for example using three instead of two memory channels. That changes dramatically when you are running tens of virtual machines on top of a physical machine: many applications with medium bandwidth demands morph into one big bandwidth-hogging monster. The challenge is thus to provide access to the memory as fast as possible, lower energy consumption, and better support for virtualization. On paper, the Nehalem architecture definitely can play all those trump cards. Anand has provided a detailed description of the Nehalem architecture. The most important improvements for business applications are:

The integrated memory controller talks to its own local memory or remote memory (NUMA). Memory access takes between 27 and 54 ns (80 to 161 cycles). Compare this to the Xeon 5450 at the same clock speed where memory access via the MC in the chipset can take up to 123 ns! The closest competitor (Opteron "Shanghai") needs between 32 and 71 ns.

A native quad-core design with fast 33 cycle L3 cache make it easy for the L2 caches to exchange cache coherency information

Fast CPU interconnects make sure that the rest of the snoops happen very fast and do not interfere with other traffic.

The memory controller has up to three channels. A dual CPU configuration has access to 35GB/s of memory bandwidth (measured with stream) if you use DDR3-1333. The latest dual Opteron achieves 19.4GB/s with DDR2-800

Basically, Nehalem is Intel's version of the improvements found in the AMD Barcelona platform, only better (or at least that's the goal). Let's see what it can do in reality.

Post Your Comment

44 Comments

Is it me or is page 2 of this article missing some information? The title of that 2nd page is "What Intel and AMD are Offering," but in the body of the text there are only descriptions of Intel's Xeon chips? Perhaps a new title to reflect the body, or add AMD info? Reply

Very nice to see a comparison over some generations of Xeon platform, including the new one (yet to be released).

I would like to see a new article with Core i7 vs Xeon 5500... to check out if my Core i7 @ 3,7GHz is good enough in Maya 2009 (Windows XP 64bit, 12GB DDR3), or if a Xeon 5500 (each at 2,4GHz, for instance) in dual processor configuration will be a much better buy. Reply