Today Intel started talking about its ISSCC plans and included in the conference call were some details on Westmere that I previously didn't know. Most of it has to do with power savings, but also some talk about 32nm quad-core Westmere derivatives!

Westmere is Intel’s 32nm Nehalem derivative. Take Nehalem with all of its inherent goodness, add AES instructions, build it using 32nm transistors and you’ve got Westmere.

Westmere's Secret: Power Gated Un-Core

We just recently met the first incarnation of Westmere - Clarkdale, the dual-core processor that’s been branded the Core i3 and Core i5. Later this quarter we’ll meet Gulftown, a six-core Westmere that’ll be sold under the Core i7 label. All of that is old news, now for the new stuff.

With Nehalem Intel started power gating parts of the chip. Stick a power gate transistor in front of the supply voltage to each core and you can effectively shut off power (including leakage power) to the core when it’s not in use. This was a huge step in increasing power efficiency, something that’s evident when you look at Nehalem idle power numbers.

When you shut off a core you need to save the core’s state so that when it wakes back up it knows what to do next. Remember that power down these cores can happen dozens of times in the course of a second. The cores can’t wake up in a reboot state, they need to simply shut off when they’re not needed and wake back up to continue work when they are needed.

In Nehalem the core’s state (what instruction it’s going to work on next, data in its registers, etc...) is saved in the last level cache - L3. Unfortunately this means that the L3 cache can’t be powered down when the cores are idle, because that’s where they store their state information. Take this one step further and it also means that Nehalem’s L3 cache wasn’t power-gated.

In Westmere, Intel has added a dedicated SRAM to store core state data. Each core dumps its state information into the dedicated SRAM and then shuts off. With the state data kept out of the L3 cache, Westmere takes the next logical step and power gates the L3.

Intel lists this dedicated SRAM as a Westmere-mobile feature, there’s a chance it’s not present on the desktop chips. But it makes sense. Without a way of powering down the L3 cache, Westmere would be a very power hungry mobile CPU. Westmere appears to make it mobile-friendly.

Hex and Quad Core Westmere in 2010?

The last bits of information Intel revealed have to do with its high end desktop/workstation/server intentions with Westmere. The 6-core Westmere is a 240mm^2 chip made up of 1.17B transistors:

That’s six cores on a single die, but with 12MB of L3 cache. Remember that Nehalem/Lynnfield have 8MB and Clarkdale has 4MB. Nehalem’s chief architect, Ronak Singhal told me that he wanted to maintain at least 2MB of L3 per core on the die. A 6-core Westmere adheres to that policy.

The chip works in existing LGA-1366 sockets, so you still have three DDR3 memory channels. 6C Westmere does support both regular DDR3 (1.5V) as well as low voltage DDR3 (1.35V). This is particularly useful in servers where you’ve got a lot of memory present, power consumption should be noticeably lower.

The other big news is that Intel will be releasing 4-core variants of Westmere as well. While I originally assumed this would mean desktop and server, Intel hasn't committed to anything other than a quad-core Westmere. These parts could end up as server only or server and desktop.

The table below shows you the beauty of 32nm. Smaller die, more transistors:

CPU

Codename

Manufacturing Process

Cores

Transistor Count

Die Size

Westmere 6C

Gulftown

32nm

6

1.17B

240mm2

Nehalem 4C

Bloomfield

45nm

4

731M

263mm2

Nehalem 4C

Lynnfield

45nm

4

774M

296mm2

Westmere 2C

Clarkdale

32nm

2

384M

81mm2

It also shows that there's a definite need for Intel to build a quad-core 32nm chip. Die sizes nearing 300mm2 aren't very desirable. The question is whether we'll see quad-core 32nm in 2010 desktops or if we'll have to wait for Sandy Bridge in 2011 for that.

1. In which case it's silly to discuss (and compare) power consumption for a supposedly power efficient CPU with peripherals that take up 6-8 times as much power as the CPU, non?

2. SPCR also includes everything in its system power, except for PSU losses. And I refuse to believe that a modern CPU could be only 10% efficient at 100W.

It still does not explain why AT's test systems are drawing so much power at idle. Most modern GPUs are pretty efficient at idle these days, if it's an issue with GPU or other component choice, then the testbed needs to be updated, otherwise it is useless as a metric for CPU power efficiency. Likewise with software/drivers/windows settings. Reply

and no, it's certainly not "silly" to discuss power consumption of an entire system as it's a real world test, the i5-750 and i7-860 required a discrete card. SPCR reviewed chips with on-chip graphics, so did not need a discrete card. The i5/i7 comparison is to other chips using as close to the same systems as possible, the differences being unavoidable, i.e. you can't run an i7-920 with 2x2 GB DDR3 and the various chipsets are ALWAYS going to be running along with the processor. Reply

Actually, all the spcr tests bar one were done with a 9400GT and a Raptor HDD (actual model unspecified).

Looking at GTX285 idle power consumption across various websites, it actually seems to idle quite well, at around 30-40W - the 9400 GT idled around 10W (est) at SPCR.

That still doesn't explain the discrepancy between AT's idle figures and SPCR's. Even factoring in the GTX280/285, and including PSU losses, there's about 40-50W discrepancy in the X3 720 results, which amounts to 90-100% difference from SPCR's figures. I doubt minor component changes (RAM, HDDs) would make up so much difference, especially at idle, when their power draw is lowest.

It may be a matter of software/bios/acpi configuration settings, or it could just be that the testing methodology of the two sites is not directly comparable. But I am curious which, and why. Reply

I am wondering should i wait for Advanced Vector Extensions (AVX)?
Is it going to double the speed of video encoding?
If so then it is going to make a huge difference.
I am waiting for the intel developer forum. Probably there will be more light on the subject.
Reply