Intel pushes Nehalem EXs into 2010

With Advanced Micro Devices admitting that it's getting ready to launch its "Istanbul" Opteron six-shooter, Intel can't afford to let AMD monopolize all the talk about x64 processors. And that is why Intel hosted a conference call today about its eight-core "Nehalem EX" processor for four-socket and larger servers, even though the real news was that the company's partners would not be able to get Nehalem EX systems into the field until early 2010.

Or maybe even a little later than early 2010 for some of the larger Nehalem EX servers. But before we get into all that, let's go over the feeds and speeds Intel divulged today as it previewed the Nehalem EX processors.

Intel's Nehalem EX processor
(Click to Enlarge)

These high-end server chips - which will probably be called the Xeon 7500s when they come to market and were once known by the code-name "Beckton" - have over 2.3 billion transistors and will be implemented in Intel's 45 nanometer high-k metal gate technology, the same process that Intel used to make the the "Nehalem EP" chips used for two-socket servers (these were launched back on March 30).

The Nehalem EX chips will pack up to eight processor cores onto a single die, with each core equipped with HyperThreading, Intel's implementation of simultaneous multithreading. (This technology lets each core look like two cores to the systems software on the box using the chip. It's a kind of instruction stream virtualization).

The Nehalem EX chips will also sport the Turbo Boost technology that made its debut on the Nehalem EPs, which allows unused cores in the chip to be quiesced and the remaining cores to have their clock speeds boosted a little bit. The Nehalem EX will also sport 24 MB of shared L3 cache memory, which is actually distributed with each of the cores, as you can see from the chip layout diagram to the left.

The Quick Path Interconnect (QPI) point-to-point interconnection technology that debuted in the desktop Core i7 and server Nehalem EP chips will be more fully deployed with the Nehalem EX systems. The four QPI ports on the Nehalem EX sockets as well as a pair of I/O hubs in the "Boxboro-EX" chipset that goes with the octo-core chip allows for four-socket or eight-socket systems (with as many as 32 or 64 cores and twice as many threads) to be created "gluelessly," meaning that you don't have to architect another chipset.

This was the promise that AMD always held out for its own Opteron processors and their HyperTransport interconnect, but very few vendors actually delivered machines that put four two-socket boards together as the Opteron design has allowed from day one.

Boyd Davis, general manager at Intel's Server Platforms Group marketing, said that the Nehalem EX design will use standard unbuffered DDR3 main memory, rather than fully buffered DIMMs, and will put memory buffers somewhere between the on-chip DDR3 memory controllers on the Nehalem cores and the memory DIMMs using something Intel calls the Scalable Memory Interconnect. With eight cores per chip and up to 16 memory slots per socket, the memory used in conjunction with the Nehalem EX chips need buffering, said Davis.

But the economics were apparently better to put the buffering somewhere in the system rather than on the memory DIMMs themselves. Davis would not elaborate much more on this technology, except to say that he would not talk about the interfaces used to link these buffers to the DDR3 main memory or how much heat they generate. He did say that it was part of the system and could not be circumvented.

In comparison with the current four-core and six-core "Dunnington" Xeon 7400 processors, announced last September, Davis said that the Nehalem EX systems would support about twice the memory, would have more reliability features at both the chip and system level, would deploy 2.7 times as many threads and 1.5 times the cache memory, and would expand out across twice as many sockets in a single system image. Most significantly, the Nehalem EX systems, based on some internal Intel tests, will have up to nine times the memory bandwidth of those Dunnington machines and their crufty and slow front side bus architecture.

Mum on clock speed

Davis would not talk about clock speeds, but said that the Nehalem EX processors would show more performance gains on database, integer, and floating point performance than the Nehalem EP chips showed over their Xeon 5400 processors. (The Nehalem EPs, sold as the Xeon 5500s, had 2.5 times the database performance, 1.7 times the integer throughput, and 2.2 timers the floating point throughput than the Xeon 5400s they replaced).

Given that you are comparing an eight-core chip to a six-core chip with the Nehalem EXs and to quad-core chips with the Nehalem EPs, you would expect the performance boost to be larger on threaded workloads like the tests Intel is using to make these comparisons.

Intel is thinking that the ability to make gluelessly connected eight-socket systems will open up a little bit more of the high-end server market to the Xeon platform. While IBM and Unisys/NEC have both created Xeon 7400 machines that span up to 16 processor sockets, four-socket boxes are the norm when it comes to Xeon 7400 servers, and quite frankly, these machines have very little benefit over two-socket Nehalem EP machines at this point. QPI will breathe a little life in the high-end of the x64 server market, which really doesn't want to buy Itanium machines for scalability.

A four-socket Nehalem EX box will have 32 processor cores and 64 threads, and using 8 GB DDR3 DIMMs, it could support as much as 512 GB of main memory. Now, without any special chipset, a machine can expand to double that - 64 cores, 128 threads, and 1 TB of memory - without anything other than Intel's chips and the Boxboro-EX chipset. That's as big as big iron gets these days, and Intel says this is why it has eight OEMs getting ready to sell these eight-socket boxes, and they have 15 different designs.

IBM demonstrated its System x version of such a configuration at the Intel preview event, and it's committed to making even larger systems based on the Nehalem EX chips with its forthcoming EX5 chipset. Alex Yost, vice president in charge of IBM's System x and BladeCenter business, said during the demonstration that the 64-core configuration - and presumably larger machines with more than eight sockets or otherwise you would not need the EX5 chipset - were in their testing cycles now and that IBM was preparing to deliver "a very exciting product in just a year's time."

May 2010 sure does sound like a long time away to wait for high-end Nehalem EX systems. But that's the way it goes in the high-end server racket, as you all know.

There have been some rumors about the Nehalem EX chips slipping into the first quarter of next year, and this is obviously what the talk is referring to. Intel's roadmaps are vague enough that you can't pin down dates with them any more, but the chatter at the end of last year was that Nehalem EX chips would come out in the fall for system deliveries late in the year.

Intel never promised anything that specific publicly, which is why Davis could claim today that the Nehalem EX was "on track" for production in the second half of 2009. Given that AMD is getting ready to put its six-shooter Istanbul chips into the field any day now, there is a little heat on Intel. But the real heat starts in early 2010, when AMD will get its kicker "Magny-Cours" six-shooters into the field with its own chipsets, chipsets that better scale well beyond four sockets if the company has any sense at all.

One of the other features that Davis divulged as part of the Nehalem EX preview was something called machine check architecture (MCA) recovery, which predicts, detects, and corrects processor, memory, and I/O errors inside the Nehalem EX chip. The Itanium family of chips sport a similar MCA feature, and the Itanium bashers will see this as well as the performance delivered in the Nehalem EX chips as yet another reason why Intel should kill off the Itanium chips. (The quad-core "Tukwila" Itanium chips were pushed out to the first quarter of 2010 last week after being delayed a number of times).

While Davis conceded that Intel had to "keep pushing Xeon as hard as we can" and said that the company was indeed poaching features from Itanium and injecting them into Xeon chips, the fact is the company nonetheless expects for Itanium to be a viable product "for many years to come."

That may be necessarily true for HP-UX, OpenVMS, and NonStop workloads, which are only available on Hewlett-Packard's Itanium-based Integrity machines, but that is certainly not true for Linux, Windows, or Solaris, which are perfectly happy to run on Xeons. Last December, Fujitsu tested a PrimeQuest Itanium box with 32 of Intel's dual-core "Montvale" Itanium 9150M processors spinning at 1.66 GHz, and it delivered 2.38 million transactions per minute (TPM) of throughput on the TPC-C online transaction processing test.

IBM's eight-socket, 48-core Dunnington box, a System x3950 M2, was able to hit 1.2 million TPM. Now, assume that an eight-socket Nehalem EX box can do 2.5 times that, as Intel says it a low water mark for the performance increase expected, then we are talking about being able to hit 3 million TPM with an eight-socket Nehalem EX with 64 cores. To be sure, the Tukwilas will double it all up again, but there is a much smaller base of shops that need to do that. And many more shops will probably go with Nehalem EX servers wherever they can. ®