Details of Nehalem-EX were widely released today: the CPU that will finally kill Itanium. Nehalem-EX looks to be an impressive monster performer. With eight cores, 16 threads, 24MB of shared L3 cache, integrated memory controllers, four high-bandwidth QPI links and 2.3 billion transistors, Nehalem-EX is due out in 4Q’2009, just 4-6 months from now.

In addition to raw specs, a new Advanced RAS with a Machine Check Architecture (MCA) recovery mode allows otherwise fatal system errors within the CPU, memory or I/O to be identified. Once identified, the CPU can signal an interrupt to the operating system so that OS/hypervisor software can determine what the best way to recover from the error is. This may result in shutting down the thread that was running, reloading the faulty memory location, recomputing the last instruction, or even prompting the user what to do.

This Nehalem-EX system will be known as Xeon 7500, codenamed “Beckton”. Its 2.3 billion transistors are manufactured using Intel’s 45nm high-k metal gate (HKMG) technology, which is the same as the Nehalem-DP’s available today, though Intel will be releasing 32nm processors about the same time Nehalem-EX is available — meaning the die shrink is not too far behind.

#1 x86-64 TPC-C Benchmark, first x86 machines to break one million transactions per minute.

#1 VMware 24-Core benchmark

#1 SAP SD Standard Applications two-tier 8-processor result

#1 SAP SD Standard Applications two-tier 4-processor result

#1 Oracle E-Business R12 Large Payroll Batch Benchmark

#1 TPC-C 4-processor benchmark

#1 TPC-E 4-processor benchmark

#1 SpecCPU2006 benchmarks

2x number of sockets of Xeon 7400

2.7x/1.5x the number of threads / cache of Xeon 7400

9x memory bandwidth of Xeon 7400

2.5x database performance of Xeon 7400

1.7x integer throughput of Xeon 7400

2.2x floating point throughput of Xeon 7400

Intel had planned to release this Nehalem-EX information as a response to AMD’s Istanbul Opteron benchmarks, which were supposed to be released today. However, AMD has reportedly moved the NDA back two weeks.

This CPU will likely seal Itanium’s fate as it will be extremely comparable in performance to the highest-end 9100 Tukwilia Itanium. The continued pouring of 100s of millions of dollars into Itanium will not long seem justified compared to the follow-on to Nehalem-EX.

Given Nehalem-EX’s massive performance today, and the likely continued growth from the follow-on product, how Intel will be able to justify spending so much R&D on Poulson (Tukwila Itanium’s follow-on) seems increasingly impossible. At some point shareholders are going to want to see exact financial breakdowns of Itanium’s contribution in revenues, as well as total unit sales, market analysis reports showing future trends, and more.

In fact, all one has to do is look at the top 100 systems today at the Top500.org website to see that Itanium is not cutting the mustard. Its highest ranking is #48, with the only other entry coming in at #63, and only 5 listed in the entire top 500.

With so much performance from Itanium’s 9000-series, you would expect there to be many more top 100 hits. There isn’t. So it’s very likely Nehalem-EX sounds the death knell for Itanium. Tukwila will likely be the last version, assuming it’s ever actually released.

I was sitting back looking at the picture of Nehalem-EX’s die. 2.3 billion transistors, 45nm features, high-k metal gate construction. So many things could go wrong in manufacturing this CPU. I’m wondering what kind of redundant internal features Intel must have introduced into the CPU to get high yields?

Are there extra units here and there so that, while the chip is being configured and binned, the sub-components within the CPU itself, those which produce erroneous results, are re-mapped before being permanently “burned into” existence as part of that chip’s feature set?

Consider also that when I visited Intel’s debug lab in 4Q’2007, they were already debugging Gainestown CPUs (Intel’s latest Nehalem DP released unto the world at the end of March). While the director of the debug lab would not let me take photos, there were several stations which had Gainestowns in place for debugging efforts. Now, that was 1.5 years prior to their official release.

This Nehalem-EX die is flatly beautiful, and it is also something that Intel had designed at least one year ago, if not 1.5 years ago as well. It makes me wonder what the devices they’re working on today will look like when the end of 2010 rolls around, or sometime in early/mid 2011.

I believe the majority of the benchmarks listed are with Intel Xeon 7400 and not Nehalem-EX

M.Tahir Khan

Good Info

[OvO]wl

I am sorry but i don’t see that Nehalem will signal end of Itanium. You see the sockets for Xeon and Itanium are being standardised. This means that all the reasons for not getting Itanium at the moment will disappear (slow I/O, poor graphics performance, slow RAM, periferal expense etc).

When people start swapping Xeon and Itanium CPUs in and out of the same motherboards, they will start to realise that actually Itanium FPU capabilities are pretty superlative, and that x86 cracks most of it’s integer and floating point speeds out of vector processing, which Itanium is not so far ahead at.

In the future this kind of computation will be less focused on the CPU and more on GPUs for scientific application. In the past the I/O between RAM and GPUs has been problematic, largely because of NVidia’s reluctance to pull support the platform and this was one of the things that really contributed to itaniums struggling reputation. Where computation isn’t about repeated bashing the same calculations repeatedly, the explicitly parallel instruction set of itanium together with the excellent BHT and the superior FPU performance will really hammer x86, particularly when 8 core 32nm itanium is released.

The bottom line is the strengths between Nehelam and Itanium are not about cache sizes, Itanium used large cache sizes because it’s RAM was so slow because it was ECC. The merits are about the fundemental differences in architecture, and the supremacy of the itanium’s FPU.