CPUs are tricky beasts. There was a time when simply looking at the number of megahertz on a chip was a surefire indication of how well it would perform, but sadly that just isn’t the case any longer. With Intel and AMD at each other’s throats for the biggest piece of the market, their approaches to the technology have taken different paths. Considering the new 64-bit and quad-core chips available now, does clock speed even mean anything?

The question of whether or not to upgrade to a multi-core processor has long been settled—you definitely want the advantages a multi-core proc gives you. You get a payoff right away with extra breathing room for background applications to run without dragging your gaming performance down to a crawl, and then you’ll get another payoff in the future, as more and more games optimized for multi-core processors (like Company of Heroes: Opposing Fronts and Crysis) hit the shelves. And with quad-core procs from Intel hovering at around the same price as slightly higher-clocked dual-core chips, going quad core is a little bit of future-proofing that you can’t afford to miss.

Of course, the Intel-versus-AMD debate rages on. AMD has rolled out its “Phenom” series of multi-core processors, along with an announcement that it’s bringing “true quad-core” to the desktop (with the intention of scaring the crap out of potential Intel shoppers). What that refers to is that quad-core Phenom processors have four individual cores on a single piece of silicon, whereas Intel’s quad-core procs are really two dual-core procs stuck together. Does this fundamental design difference matter? We’ll give you the answers.

Q: What exactly is a Penryn?

A: Penryn is the “family” name for Intel’s follow-up to its 65nm Core 2-lineage CPUs. For consumers, Wolfdale will be the dual-core Penryn, Yorkfield will be the quad-core version, and Harpertown will be the quad-core Xeon workstation CPU.

The big enhancement is the process shrink from 65nm to 45nm. Intel calls its move to a 45nm process the “biggest change to computer chips in 40 years.” Intel’s tendency toward self-aggrandizement aside, the 45nm process is a significant jump forward, allowing twice as many transistors to fit in the space of a 65nm chip. The 45nm process also uses high-k gate dialectics. Not to be confused with L. Ron Hubbard’s Dianetics, the high-k gate using hafnium oxide replaces the silicon dioxide gate that’s been in use since the 1960s. The new transistor leaks less energy, produces less heat, and is able to switch faster than a silicon dioxide transistor by 20 percent. This boils down to smaller, faster, more power-efficient CPU cores. How much smaller? The previous Core 2 Extreme quad cores packed 582 million transistors within a space of 286mm2. The Yorkfield quad core packs 820 million transistors into 214mm2.

Q: So what else is new under the hood?

A: Penryn is more than a simple die shrink. The new CPUs are based on the Core 2 microarchitecture with a few tweaks that Intel hopes will keep it ahead of AMD. The headliner of these tweaks is the new SSE4 instruction set designed for media encoding and high-performance computing. Also new is a Super Shuffle Engine, which increases the speed of many SSE media-encoding instructions by doubling the processing units from 64-bit to 128-bit.

Penryn also includes a new Fast Radix-16 Divider that pretty much doubles the division math speed. Intel also reportedly boosted virtual machine performance by as much as 25 to 75 percent. And Intel added a new feature called Dynamic Acceleration Technology that essentially overclocks one of the cores when the others are sleeping.

The new chip also makes use of all the physical space freed up by the die shrink. (Imagine if all the stuff in your garage shrunk by 50 percent!) That’s what accounts for the beefed up L2 cache, which at 6MB per core is a 50 percent increase over the L2 in 65nm quad cores. The larger L2 cache helps in numerous ways, but its biggest contribution is in ameliorating the potential performance hit caused by the ancient shared front-side bus architecture Intel uses for communication between cores. To keep the front-side bus from bogging down, the large and very efficient L2 cache ensures that the CPU has ample data close at hand so it won’t be data starved. While Intel has certainly proved that the FSB strategy is still workable, the company has stated it plans to adopt an on-die memory controller in its next CPU.

Q: How significant is the new SSE4 instruction set?

A: Instruction sets in CPUs always garner the most attention but, sadly, are usually the last feature to actually add performance benefits. While the Fast Radix-16 Divider and the Super Shuffle Engine in Penryn will increase the performance on many existing applications, the 47 new instructions in SSE4 will not give you any performance boost until applications directly support them. SSE4’s main claim to fame will be in media encoding and high performance computing (i.e., supercomputers). In fact, Intel’s demonstrations of SSE4-enabled encoders showed incredible performance boosts.

However, those demonstrations have been called into question, with skeptics suggesting that while the alpha build of DivX used for the proof-of concept benchmarks is faster with SSE4, it’s not a realistic scenario. One developer we spoke with told us: “The applicability of SSE4 for our codecs seems rather limited and the expected gain seems rather small (I expect no more than a 1- to 2-percent speed gain with SSE4) compared to the speed increment we got from SSE on pre-Core 2 Duo and SSE2 on Core 2 Duo. The SSE4-instructions that are often advertised as being especially targeted for video encoding are useless for us, since those instructions are only applicable for exhaustive search algorithm (ESA), which we don’t use because of its inherent inefficiency.”

Q: Is Penryn faster than the current Core 2 quad cores?

A: We don’t want to give away the punch line but, generally, an equivalent Penryn runs up to 14 percent faster when compared clock-for-clock with the current Core 2 quads. The exact speed increase depends on the benchmark. In some, you’ll see no change in performance; in others, a healthy increase is possible. But remember, Penryn isn’t the big leap forward. Intel’s CPU schedule dictates a little jump one year and then a big jump the next year. This is the little jump. Intel hopes to make a big jump when it introduces its Nehalem CPU in late 2008.

Q: Will Penryn work in my motherboard?

A: Long-time Intel lovers have been vexed by this for years, as the company’s been in the habit of invalidating perfectly good motherboards by requiring new or updated chipsets to run its latest CPUs. Want a 1,066MHz P4 on a 925X mobo? Sorry, you need a 925XE. Pentium D on a 925XE? Nope, you need a 955X chipset. Pentium 955 EE on a 955X? Guess again: 975X.

Fortunately, Intel has gotten a little better in this area, and there is a very good chance that a QX9650 will work in many existing motherboards. Certainly motherboards that use Intel’s P35 and X38 chipsets will support the new CPU (although a BIOS update might be required). Some Intel 965 and 975X boards might also work with the new CPU and we understand that the majority of 680i boards will be compatible. To be safe, however, before you buy any board/CPU combination, check the manufacturer’s website to see what processors it has validated with the design. Just because the Yorkfield and Wolfdale are LGA775 doesn’t mean they’ll work in the board of your fancy.

Above: Intel’s 45nm die shrink allows engineers to pack nearly twice the number of transistors into the same space as a 65nm CPU