At a San Francisco event last week, chip maker Intel, started previewing some of the features of its next-generation "Penryn" processors and the future "Nehalem" processors that will follow them to market. The preview shows that Intel, awakened after a long slumber by the success of rival Advanced Micro Devices in defining the X64 architecture, is learning from its own mistakes and the things that its rival has done right.

Pat Gelsinger, senior vice president and general manager of the Digital Enterprise Group at Intel, showed off roadmaps that run out for the next four years, showing that the company plans to use a mix of advancing chip architectures and chip fabrication processes as a two-stroke engine to keep chip innovation moving over the next four years in the X64 market. The company does not want to let up for even a bit because that will give AMD the kind of openings Intel allowed in the market first with 64-bit memory addressing and then with multicore processing, which has allowed AMD to get about a fifth of the market for servers and PCs.

As many have been expecting for some time, Gelsinger confirmed that Intel was taking a page out of the playbooks of other processor makers and would finally add main memory controller and direct core-to-core connections to future generations of Core desktop and Xeon server and workstation processors. The reason Intel is finally doing this is simple: the approach that it has been using to date--relatively slow front-side buses linking to memory backed by large on-chip caches is not as thermally efficient at the system level as putting it all on the chip. Given Intel's significant lead in chip process technology--it is way ahead on the 65 nanometer front, and there is no reason to believe Intel cannot maintain its process advantage on 45 nanometer and 32 nanometer processes--Core and Xeon processors with large on-chip caches, integrated memory controllers, zippier streaming media extensions, and chip-to-chip links should give the Athlon and Opteron chips from AMD a run for the money.

Just as server vendors create a server platform that can support two generations of chips, chip makers create fabrication processes that span two generations of processors. But rather than keep the two in phase, as server platform makers do, it is smarter to keep the change in chip architecture and manufacturing process out of phase. This is precisely what Intel has been doing since 2006--the company shrinks a derivative chip with a new process, then puts out a new architecture, then shrinks it again. By doing it this way, Intel does not have to debug a chip design and a fabrication process at the same time, which can significantly delay a chip's market debut. (One might argue that a shrink and an architecture shift happening at the same time almost guarantees a delay.)

In early 2006, Intel created a shrink of its existing Pentium and Xeon processors using its then-new 65 nanometer process. Then, after this process was ramped up, it rolled out the Core architecture and the Core and Xeon derivatives. The next-generation Penryn chips are slightly modified versions of the current dual-core Core 2 and dual-core and quad-core Xeon 5100 and 5300 processors, and they will be implemented in the high-k/hafnium gate 45 nanometer processes that Intel announced earlier this year as its future manufacturing process. Gelsinger said that Intel had 15 chips using this 45 nanometer process in the works, and that it will have two plants in production using the new process by the end of 2007. The shrink from 65 nanometer to 45 nanometer processes will allow about twice as many transistors per chip, and the faster transistor switching speed enabled by the new process will allow Intel to crank the clock speeds by about 20 percent and to boost the number of instructions that can be processed within a clock cycle.

Intel expects to have four Penryn chips in production by the second half of 2007. The Penryn family of chips will come in dual-core variants for desktops and laptops with the Core brand, in quad-core variants for high-end PCs with the Core brand, as well as in dual-core and quad-core variants for workstations and servers under the Xeon brand. A quad-core variant of the Penryn Core 2 will have 820 million transistors and will be 25 percent smaller than the current "Kentsfield" quad-core chip, but still fit in the same thermal envelope.

With the Penryns, Intel is introducing its SSE4 media extensions for graphics processing, on-chip L2 caches that are 6 MB or 12 MB in size (the same as the Itanium 9000s and 50 percent larger than equivalent chips today in the Intel lineup), split loading for the caches, and higher bus speeds (up to 1.6 GHz). The caches are shared across two cores (rather than dedicated to each core, as was the case with the current Core 2 and Xeon line); the quad-core chips are really quasi-quad cores, meaning that Intel is putting two dual-core chips side-by-side in a single package so they can share a single CPU socket.

The Penryn processors will also sport a performance boosting feature called Dynamic Acceleration and a power saving feature called Deep Power Down.

The Dynamic Acceleration feature (which smells like a recycled implementation of the Pellston acceleration feature that was dropped from the Itanium chip) is interesting in that it will allow a multicore chip to balance performance across the cores to maximize performance as workloads run. If one core is under heavy load from a piece of software but another core is idle, the chip knows that it can kick up the performance on the first core to help get that work done without going anywhere near the thermal design limits of the chip. This can help the performance of single-threaded applications, which is still an issue for many IT shops since clock speeds have stagnated at around 3 GHz in the X64 market.

The Deep Power Down feature adds a new state in the processor where the core voltage is dropped down to a bare minimum, the clocks are turned off, L1 and L2 caches are flushed and turned off and the idle power of the computer is a fraction of the current sleep state of a machine using a Core 2 chip.

The desktop variants of the Penryn chips are expected to have top-end clock speeds in excess of 3 GHz. The Penryn Core 2 variants will have a 65 watt TDP and a maximum of 6 MB of shared L2 cache. The quad-core variants will put two of these chips side-by-side in packages that have TDPs of 95 watts and 130 watts. Intel will also make Xeon variants of the Penryns for single-, dual-, and mutli-socket servers and workstations. The dual-core Penryn Xeons will come with 40 watt, 65 watt, and 80 watt TDPs, while the quad-core versions of the Xeons will have 50 watt, 80 watt, and 120 watt TDPs. All of the Penryn parts will plug into existing Core 2 and Xeon platforms. No upgrades to hardware will be necessary.

Looking out into 2008, Intel is cooking up the next-generation architecture and family of chips, called Nehalem. The Nehalem architecture is a derivative of the Core design, but it has a few more tweaks. First, it will have multi-level, on-chip caches that have a shared structure, not the sometimes partitioned structure of the current Core 2/Xeon and future Penryn cores. It will also cram more stuff on the chip, thanks to the 45 nanometer shrink.

Specifically, Intel is finally admitting that it will have at least eight and possibly more cores on the Nehalem chips. And Intel will bring controllers for main memory and cache memory onto the chip, and will go so far as to implement chip-to-chip interconnects that will look strangely like AMD's HyperTransport links. (It may even turn out that Intel is using HyperTransport, since it is probably buried in a cross-licensing agreement somewhere from the last time AMD and Intel buried the legal hatchet.) Intel is also putting its HyperThreading back into its chips with the Nehalem chips, which will give it an advantage over AMD, which has never implemented simultaneous multithreading, or SMT as this technology is generically called, in its processors.

SMT means putting some extra electronics into the chip so it presents a single instruction pipeline as two virtualized pipelines as far as the operating system is concerned, and then interleaving instructions in the virtualized pipelines to get anywhere from 20 to 40 percent more work out of the chips. When Intel moved to multicore processors with the Core architecture last year, it abandoned HyperThreading; the Itanium processor just got HyperThreading for the first time with the dual-core "Montecito" Itanium 9000 family.

The Nehalem chips will not plug into the same slots as current Core 2 and Xeon processors or the future Penryn variants of them. So platform providers will have to shift to a new set of chipsets and motherboards. It could even turn out that the Itanium processors get equipped with the same interconnections and will be able, as Intel promised many years ago, to co-exist within the same systems.

Looking out even further into 2009, Intel will eventually move to a 32 nanometer shrink of the Nehalem chips, dubbed "Westmere," followed a year later, in 2010, by a tweaked architecture code-named "Gesher," also to be implemented in 32 nanometer processes. As usual, Gelsinger did not say much about these future chips, except to give them names.