Related topics

AMD locks and loads 'Istanbul' six-shooter

Gunning for Dunnington, Nehalem

Common Topics

Advanced Micro Devices, after weeks of hinting that its six-core "Istanbul" Opteron processors were right around the corner, is finally firing the kickers to its "Shanghai" quad-core Opterons right at the new and forthcoming Nehalem family of workstation and server chips from archrival Intel.

With the Istanbuls, AMD and Intel have entered the marketing equivalent of the Mutara Nebula, because provided there are no bugs in the current generations of Opteron and Xeon processors, the odds will be even. (Well, more or less.)

Intel has a quad-core and six-core "Dunnington" Xeon 7400 that does not have simultaneous multithreading (what Intel calls HyperThreading, which turns each physical core into two virtual cores and boosts performance by around 30 to 40 percent on many workloads) and the quad-core "Nehalem EP" Xeon 3500 and Xeon 5500 processors for single-socket and two-socket servers, respectively. Nehalem EPs, also known as the "Gainestown" processors, use Intel's HyperTransport-like QuickPath Interconnect for linking processors, memory, and I/O, and have four cores, each with hyperthreading.

AMD has decided against using simultaneous multithreading in the Opteron processors, so the Istanbul chips announced today with six cores on a single die do not have their performance goosed with virtualized instruction streams. It is hard to imagine that the Dunningtons will be able to keep pace with the Istanbuls on big x64 iron, given the benefits of the HyperTransport point-to-point interconnect compared to the old frontside bus architecture used in the Dunningtons. (These are the last Intel chips that will use FSB instead of QPI to link processors to memory and I/O.)

With the two-socket server being the workforce of the IT industry, the main battle between these two enterprises will be fought between the Xeon 5500s and the Opteron 2400s, as the Istanbul chips for two-socket boxes are called. But make no mistake: until Intel gets its "Nehalem EX" Xeon 7500 chips to market in early 2010, AMD is going to try to open fire with the high-end Opteron 8400s in the four-socket and eight-socket space.

There is more than one battle going on in this economic meltdown dust cloud, and this time, the Genesis Effect might be a little something called $5tn in global stimulus spending. (That's probably taking a metaphor too far. It happens.)

The Istanbul Opterons contain 904 million transistors, which consist of six cores, each with 64 KB of L1 data cache, 64 KB of L1 instruction cache, and 512 KB of L2 cache per core. The chip, which is implemented in a 45 nanometer silicon-on-insulator process and manufactured by AMD's fab spinoff, GlobalFoundries, in its Dresden, Germany fab. Each chip also has 6 MB of L3 cache that is shared by all of the cores, as well as the full AMD-V virtualization and AMD-P power management feature sets. (AMD-V consists of rapid virtualization indexing, tagged TLB, and extended migration, while AMD-P consists of smart fetch, power cap, and CoolCore features.)

The Istanbul chips use the same Socket F processor socket as the earlier Rev F Opterons, which is comprised of a 1,207-pin organic land grid array (LGA). In plain English (well, American anyway), that means the Istanbuls plug into all of the same machines that quad-core Barcelona and Shanghai chips do as well as prior dual-core Rev F Opterons. The Istanbuls will also plug into AMD's future "Fiorano" platform, which is based on a homegrown SR5690/SP5100 chipset, according to the company.

The Istanbul Opteron has a die size of 346 square millimeters with those 904 million transistors. The Nehalem EP weighs in at 731 million transistors (also implemented in a 45 nanometer process, but in this case, Intel's own cooking) and has a die size of 263 square millimeters. If there is a direct relationship between the cost of making a chip and its size and an inverse relationship between the size of a chip and its improving yields, you can see why Intel has decided to deploy HyperThreading on its chips, and the wonder is why AMD hasn't done its own variant of HyperThreading after all of these years.

The Nehalem EP fits into the 1,366-pin FC-LGA socket from Intel. The future eight-core Nehalem-EX chip from Intel, slated for initial production late this year, will have a hefty 2.3 billion transistors; the die size has not been divulged, but it is going to be a fat chip, no doubt.

Each Istanbul core, like all prior Opterons, includes on-chip main memory controllers - in this case, supporting DDR2 main memory like the prior quad-core "Barcelona" and Shanghai Opterons. The controllers, which run at 2.2 GHz, support registered ECC DDR2 main memory running at 533 MHz, 667 MHz, and 800 MHz. The memory controllers run at the same speed regardless of the clock speed of the processor - this is one of the things that makes putting memory controllers onto chips tricky - and deliver up to 12.8 GB/sec of memory bandwidth per Rev F socket.

The Istanbul chips have three HyperTransport 3.0 point-to-point links, with up to 19.2 GB/sec of bandwidth per link, which are used to talk to other processors and I/O in a chip complex. The Istanbuls also include a new feature called HT Assist, which allows a chip in a complex to figure out which one it needs to share data with and only send requests for information to that chip.

The HT Assist feature works like this: 1 MB of L3 cache is reserved as a directory for all of the cache lines used in the system, so the chips don't have to probe the caches. Of course, you have to give up some L3 cache as well, which can affect performance for other things, like Java. Presently, an Opteron socket that needs some data it doesn't have in its cache broadcasts probe filters to all sockets in the complex, which puts a lot of overhead on the HyperTransport interconnect.

On memory-intensive benchmarks like Stream, the throughout of an Opteron server can increase by as much as 60 per cent thanks to HT Assist, and AMD is anticipating big gains for database workloads, too. The HT Assist function doesn't need to be turned on in two-socket boxes (there's only one pipe between the two chips, so they know who they are talking to already) and the feature is implemented at the BIOS level of the systems, so there is no need to tweak operating systems or hypervisors to take advantage of the HT Assist feature.