Is Nehalem Efficient?

At this year's IDF in San Francisco, Intel revealed a little discussed but extremely important aspect of Nehalem's circuit design:

The Nehalem design is Intel's first microprocessor in the past two decades to feature absolutely no domino logic, it's a fully static CMOS design. I've explained the differences between dynamic domino and static CMOS design in the past, but simply put: domino logic is used as a clock speed play. It's incredibly useful in implementing very high speed circuit paths on a chip and hit its all time peak in Intel's usage in the Pentium 4 days. The downside to using such high speed logic is that it requires a lot of power, but in microprocessor design there are always tradeoffs to be made.

There are many other energy efficiency plays within Nehalem

In Nehalem, Intel took the new architecture as an opportunity to revamp its design, went in and removed all remaining domino logic - but without impacting the peak clock speed of the architecture. The tradeoff here is one of die size, by using more parallel logic Intel was able to convert some serial, high speed paths, into larger, slower circuits that removed the need for domino logic. Details are unfortunately light and a bit beyond the scope of this review, but the move to an all static CMOS design is bound to reduce power consumption. Do you smell a comparison coming?

Both Nehalem and Penryn are built on the same 45nm process, available at the same clock speeds and capable of running the very same applications. In theory, Nehalem should be more power efficient, at the same clock speed, across the board thanks to its static CMOS design. To find out I measured average power consumption over the duration of a handful of benchmarks I used in this review.

Performance

POV-Ray 3.7

Cinebench XCPU

x264 HD

Crysis

Intel Core 2 Quad Q9450 (Penryn - 2.66GHz)

2238 PPS

11502 CBMarks

61.5 fps

34.0 fps

Intel Core i7-920 (Nehalem - 2.66GHz)

3528 PPS

16211 CBMarks

74.8 fps

33.2 fps

Nehalem Performance Advantage

57.6%

40.9%

21.6%

-2%

I picked these four benchmarks because they show us the range of Nehalem's performance, going from no performance improvement all the way up to a gain of nearly 60%. Now let's look at the power consumption in each of these four benchmarks:

Power Consumption

POV-Ray 3.7

Cinebench XCPU

x264 HD

Crysis

Intel Core 2 Quad Q9450 (Penryn - 2.66GHz)

168.1W

175.2W

167.5W

220.8W

Intel Core i7-920 (Nehalem - 2.66GHz)

202.2W

208.6W

176.6W

230.8W

Nehalem Power Disadvantage

+34.1W

+33.4W

+9.1W

+10W

If you actually go through and do the math you'll find that Nehalem, despite using more power, is more efficient than Penryn. Performance per watt is around 24% better in POV-Ray, 15.5% better in Cinebench and 13% better in the x264 HD test. Crysis, the only benchmark where Nehalem actually falls behind, does require more power and thus Nehalem loses the efficiency battle there.

It seems as if Nehalem is even more polarizing than I had though. Despite the move to a fully static CMOS design, the changes aren't enough to make up for the scenario where Nehalem can't offer more performance; power consumption still goes up, albeit not terribly.

It's also worth noting that the power comparison really depends on the CPU used, here we've got the same comparison but with the Core i7-965 vs. the Core 2 Extreme QX9770, both clocked at 3.2GHz:

Performance

POV-Ray 3.7

Cinebench R10 - XCPU

x264 HD

Crysis

Intel Core 2 Extreme QX9770 (Penryn - 3.2GHz)

2641 PPS

14065 CBMarks

73.2 fps

41.7 fps

Intel Core i7-965 (Nehalem - 3.2GHz)

4202 PPS

18810 CBMarks

85.8 fps

40.5 fps

Power Consumption

POV-Ray 3.7

Cinebench R10 - XCPU

x264 HD

Crysis

Intel Core 2 Extreme QX9770 (Penryn - 3.2GHz)

230.7W

227.6W

230.3W

293.6W

Intel Core i7-965 (Nehalem - 3.2GHz)

233.7W

230.7W

196.2W

248.5W

It's tough to draw any conclusions based on two CPUs, but it is possible that at higher clock speeds Nehalem's efficiency advantage kicks in. The QX9770 has always been a bit high on the power consumption side, whereas the i7-965, even in situations where it is slower than the QX9770, offers better power efficiency here.

Post Your Comment

74 Comments

Well, the funny thing is THG got it all messed up, again - they posted a large "CRIPPLED OVERCKLOCKING" article yesterday, and today I saw a kind of apology from them - they seem to have overlooked a simple BIOS switch that prevents the load through the CPU from rising above 100A. Having a month to prepare the launch article, they didn't even bother to tweak the BIOS a bit. That's why I'm not taking their articles seriously, not because they are biased towards Intel ot AMD - they are simply not up to the standars (especially those here @anandtech). Reply

Now give us those 64-bit benchmarks. We already knew that Core i7 will be faster than Core 2, we even knew how much faster.
Now, it was expected that 64-bit performance will be better on Core i7 that on Core 2. Is that true? Draw a parallel between the following:

Performance jump from 32- to 64-bit on Core 2
vs.
Performance jump from 32- to 64-bit on Core i7
vs.
Performance jump from 32- to 64-bit on Phenom Reply

Either I am not reading things correctly, or the 130W TDP does not look promising for the end user such as myself that requires/wants a low powered high performance CPU.

The future in my book is using less power, not more, and Intel does not right now seem to be going in this direction. To top things off, the performance increase does not seem to be enough to justify this power increase.

Being completely off grid(100% solar / wind power), there seem to be very few options . . . I would like to see this change. Right now as it stands, sticking with the older architecture seems to make more sense. Reply

130W TDP isn't much worse for previous generations of quad core processors which were ~100W TDP. Also, TDP isn't a measure of power usage, but of the required thermal dissipation of a system to maintain an operating temperature below an set value (eg. Tjmax). So if Tjmax is lower for i7 processors than it is for past quad cores, it may use the same amount of power, but have a higher TDP requirement. The article indicates that power draw has increased, but usually with a large increase in performance. Page 9 of the article has determined that this chip has a greater performance/watt than its predecessors by a significant margin.

If you are looking for something that is extremely low power, you shouldn't be looking at a quad core processor. Go buy a laptop (or an EeePC-type laptop with an Atom processor). Intel has kept true to its promise of 2% performance increase for every 1% power increase (eg. a higher performance per watt value).

Also, you would probably save more power overall if you just hibernate your computer when you aren't using it. Reply