Gear & Gadgets —

Into the Core: Intel’s next-generation microarchitecture

Earlier this year at its Developer Forum, Intel unveiled Core, the next- …

Introduction

Over a year ago at the Fall 2005 Intel Developer Forum, Intel formally announced that they would be dropping the Pentium 4's Netburst microarchitecture in favor of a brand new, more power-efficient microarchitecture that would carry the company's entire x86 product line, from laptops up through Xeon servers, into the next decade. Not since April of 2001, when Netburst arrived on the scene to replace the P6 microarchitecture that powered the Pentium Pro, Pentium II, and Pentium III, have all segments of Intel's x86 processor line used the same microarchitecture.

This past IDF saw the unveiling of some significant details about this new microarchitecture, which was formerly called "Merom" but now goes by the official name of "Core." (You'll also see Core called NGMA, an acronym for "next-generation microarchitecture.") Intel presented many of these details in a presentation on Core, and others were obtained by David Kanter of Real World Technologies. The present article draws on both of those sources, as well as my own correspondence with Intel, to paint what is (hopefully) an accessible picture of the new microarchitecture that will soon be powering everything from Windows Vista servers to Apple laptops.

A note

The original Pentium's microarchitecture was called P5. Because the Pentium Pro's microarchitecture was the successor to the P5, it was dubbed P6 by Intel. The P6 was one of the most commercially successful microarchitectures of all time, and it went through a number of changes as it evolved from the Pentium Pro to the Pentium III.

A question of breeding?

Before I get into the more technical discussion of Core's features, I want to quickly spell out how I view Core's relationship to its predecessors. As Intel has repeatedly claimed, Core is a new microarchitecture that was designed from scratch with today's performance and power consumption needs in mind. Nonetheless, Core does draw heavily on its predecessors, taking the best of the Pentium 4 and the Pentium M (Banias) and rolling them into a design that looks much more like the latter than the former.

Because the Pentium M itself is a new design that draws heavily on the P6 microarchitecture, I've chosen to place Core very generally within the P6 "lineage." However, I ask the reader not to read too much into this loosely applied biological metaphor, because my comparing Core to its P6 predecessors and talking about its development in terms of the "evolution" of the "P6 lineage" is really nothing more than an way to organize the discussion for ease of comprehension.

Core, multicore, and the big picture

When Intel's team in Israel set about designing the processor architecture that would carry the company's entire x86 product line for the next five years or so, they had multicore computing in mind. But for Intel, having multicore in mind doesn't mean quite the same thing that it means for Sun or IBM. Specifically, it "multicore" doesn't mean "throw out out-of-order execution and scale back single-threaded performance in favor of a massively parallel architecture that can run a torrent of simultaneous threads." Such an aggressive, forward-looking approach is embodied in designs like STI's Cell and Sun's Ultrasparc T1. Instead, Intel's understanding of what it takes to make a "multicore" architecture is significantly more conservative, and very "Intel."

Intel's approach to multicore is not about keeping each individual core's on-die footprint down by throwing out dynamic execution hardware, but about keeping each core's power consumption down and its efficiency up. In this sense, Intel's strategy is fundamentally process-based, which is why I said it's "very 'Intel.'" Intel will rely not on the microarchitectural equivalent of a crash diet, but on Moore's Law to enable more cores to fit onto each die. It seems that from Intel's perspective, there's no need to start throwing hardware overboard in order to keep the core's size down, because core sizes will shrink as transistor sizes shrink.

This talk of shrinking core sizes brings me to my next point about Core: scalability. The Pentium 4's performance was designed to scale primarily with clockspeed increases. In contrast, Core's performance will scale primarily with increases in the number of cores per die (i.e. feature size shrinks) and with the addition of more cache, and secondarily with modest, periodic clockspeed increases. In this respect, Core is designed to take advantage of Moore's Law in a fundamentally different way than the Pentium 4.