Poulson Itaniums hit 'Replay' for reliability

New instructions, better HyperThreading

Hot Chips The future eight-core "Poulson" Itanium is not just a process-shrink of the current four-core "Tukwila" Itanium 9300. Intel has been working to add new features to Poulson to make it useful running enterprise workloads – and to do so more reliably.

Intel already released a lot of Poulson details back at February's IEEE's International Solid-State Circuits Conference in San Francisco, and at the Hot Chips conference at Stanford University late last week, the company lifted the veil a little more – and continued to keep its head down in the legal spat between Oracle and HP over Itanium's long-term fate, .

Steve Undy, technical lead design engineer for Poulson at Intel, gave a presentation that walked through some of the chip's new features. And perhaps more important than any feature, Undy confirmed that Poulson was in its post-silicon validation and has been booted and tested on multiple operating systems and running in different system topologies.

HP's HP-UX, OpenVMS, and NonStop operating systems are expected to be available on the Poulson chips, as is SUSE Linux Enterprise Server and a number of proprietary operating systems from Fujitsu, NEC, and Bull. Poulson is on track for shipment in 2012.

A statement from Intel that was released late last week in conjunction with Undy's Hot Chips presentation said that the new Poulson instructions are intended "to help take future Itanium performance to the next level and to lay the foundation for the future of Itanium computing." The statement ended by saying that the follow-on "Kittson" Itanium processor is under development.

Like the Xeon processors, the Poulson Itaniums have a "core out" design that puts the cores on the outside edges of the chip with a shared L3 cache in the center, all linked together by a fast ring interconnect. Poulson's, the L3 cache weighs in at 32MB, and the chip has two integrated DDR3 main-memory controllers with a total of four Scalable Memory Interface (SMI) links out to memory boards.

Poulson has four full-width and two half-width QuickPath Interconnect (QPI) links, which run at 6.4GT/sec. The chips are baked in a 32-nanometer process, have an area of 544 square millimeters, have 3.1 billion transistors, and have a maximum thermal design point of 170 watts with all cores humming along.

Intel has not yet talked about clock speeds, but the speculation is that the clock speed won't change much from the current Itanium 9300s, which were launched in February 2010 and which run at between 1.33GHz to 1.73GHz. These Tukwila Itaniums are made in Intel's 65-nanometer processes, have just over 2 billion transistors, and peak out at 185 watts across their four cores.

Schematic of Intel's Poulson Itanium chip

The Poulsons will offer twice the cores of the Tukwilas, QPI and SMI links that run 50 per cent faster, plus 33 per cent more L3 cache on-chip. The Poulsons will not scale beyond eight sockets in symmetric multiprocessing configurations – the same level as the Tukwilas, which could also scale to eight sockets. Presumably the faster QPI and SMI links will help SMP performance, however.

The Poulsons will plug into the same sockets used by Itanium 9300 servers, and that might mean customers running HP's Integrity servers will do processor upgrades before they do system upgrades. This may or may not be good news for HP, but at this point, HP has admitted that Oracle's decision back in March to stop development of its database, middleware, and application software has adversely impacted Integrity server sales. In some cases, customers are putting off buying machines, and in others they've canceled orders.

There is more to the Poulson chips than just adding cores to the die and hooking them up with a ring interconnect. The Poulson cores themselves are different. Here's what they look like, schematically:

Block diagram of the Poulson Itanium core

The first interesting thing to note is that the Poulson core has fewer transistors than the Tukwila core (89 million versus 109 million) and occupies less than a third of the area, while at the same time maintaining application compatibility and doubling the instruction pipeline width to 12 instructions.

One of the new features in that updated Itanium pipeline is called Instruction Replay Technology, which is designed to improve system uptime. With the IRT feature, Intel has put an instruction buffer in the pipeline and if an instruction goes haywire as it moves down the Poulson pipeline, rather than crash the system or corrupt data, an errant instruction is re-executed from the instruction buffer.

This instruction buffer in the Poulson pipeline has another important role to play in an improved HyperThreading scheme that will debut with these future Itanium chips. The buffer breaks the pipeline into a front-end and a back-end, creating a dual-domain multithreading that allows for the front-end and back-end parts of the pipeline to be independently threaded.

Intel's chip engineers have also added pipeline-specific thread switch mechanisms to deal with this more complex and wider Poulson pipeline, as well as dual-threaded register files, dual-threaded data side translation buffers (TLBs), and a new fairness mechanism.

Intel is also adding a number of new instructions with the Poulson Itaniums to have better thread control, expanding prefetching of data and instructions for the pipeline, and adding hints for data access for L1 caches. The Poulson also has three new integer operations to boost the performance of legacy Itanium code without requiring for applications to be recompiled. ®