Small wonder: inside Intel’s Silverthorne ultramobile CPU

Intel is taking on the embedded space with a brand new microarchitecture: …

SAN FRANCISCO—In an ISSCC session this morning, Intel disclosed the first microarchitectural details of its forthcoming "Silverthorne" processor for ultramobile devices. The new chip, which I'll describe in this post, is Intel's first in-order x86 processor since the original Pentium and the architecture on which the semi giant will pin its ultramobile hopes.

Packaging, process, and power

As we've seen in photos comparing Silverthorne to a penny, Intel's new chip is a tiny 25mm2. At only 47 million transistors (about 40 percent of that is a 512K, 8-way set associative L2 cache), it's also quite lean. By way of comparison, the 65nm Core 2 Duo (2MB cache) has around 290 million transistors. The Intel processor that comes closest to Silverthorne in transistor count is the Pentium 4, which had 42 million transistors in its 0.18 micron, 2001 launch incarnation.

Silverthorne's package is a miniscule 14x13mm2, and Intel claims that the device has a TDP of 2 watts at 2GHz on 1.0V. At lower speeds, the device gets down to 0.5 watts, but it's not clear how far down Intel will have to ratchet the clockspeed to get there.

Architecture overview

Silverthorne is a 64-bit, multithreaded processor with a 16-stage, in-order pipeline. As you can see from the diagram below, instructions flow from the 32KB L1 I-cache into a set of fetch buffers, and from there into the decode unit. The processor's decode unit features two hardware decoders and one microcode decoder, and can decode up to two instructions per cycle.

Silverthorne block diagram. Source: Intel

The processor's front end can dispatch up to two instructions per cycle—from the same thread or from two different threads—into a set of per-thread instruction queues that can presumably collapse any decode- and branch prediction-related instruction bubbles.

On the instruction flow side, the back end of Silverthorne consists of two main clusters (I'd call them "blocks," but "clusters" is what Intel's diagrams use): a floating-point/vector cluster and an integer/address cluster. The floating-point cluster contains two floating-point/SIMD pipes (FP ADD and FP + SIMD MUL/DIV/PERM), and the integer cluster contains two integer/address/branch pipes (AGU/ALU/shift and AGU/ALU/jump). On the data flow side, the memory execution cluster contains two AGU pipes and the bus cluster contains the bus interface unit and the on-die, 512KB unified L2 cache.

IF1

IF2

IF3

ID1

ID2

ID3

SC

IS

IRF

AG

DC1

DC2

EX1

FT1

FT2

IWB/DC

Instruction Fetch

Decode

Dispatch

Reg. File

Data cache read

Execute

Exceptions & MT

Write-back

Silverthorne's pipeline

Silverthorne's back end

All told, Silverthorne's back end has six pipelines to which the front-end's instruction queue can dispatch instructions at a rate of two per cycle. In this section, I'll zoom in on those pipelines and discuss them one at a time.

Silverthorne's integer cluster is pretty straightforward: two basic 64-bit ALU pipes, one of which also includes a shifter, and the other of which includes the jump execution unit. Neither of these pipes, however, handles multiply or divide operations; integer MUL and DIV operations are sent to the FP/SIMD cluster.

The FP/SIMD cluster consists of two pipes, both of which are fed by a 64-bit data path. One cluster is just a basic, 64-bit scalar floating-point ALU. The second pipe is the largest, most function-heavy pipe on the entire chip—it handles scalar multiplies and divides, both 32-bit and 64-bit, and it combines with the other FP pipe to handle all 128-bit vector operations.

The end result is that Silverthorne's FP/SIMD cluster can start two 64-bit operations or one 128-bit operation per cycle. Note that it's not clear to me whether 64-bit multiplies and divides are fully pipelined, so I don't know if it's possible to issue a 64-bit MUL/DIV (either integer or floating-point) and a 64-bit floating-point add in the same cycle.

Silverthorne micrograph. Source: Intel

FEC: front-end cluster (plus L1 instruction cache)

FPC: floating point cluster

IEC: instruction execution cluster

MEC = memory execution cluster (plus L1 data cache)

BIU = bus interface unit

On the data side, the memory unit's two AGUs can do two address calculations per cycle; the 24KB L1 data cache is implemented as a register file with two read and two write ports. The bus cluster contains a frontside bus interface that can do 400 MT/s or 533 MT/s, and the 512KB on-die L2 cache with in-line error correction.

Conclusions

When Silverthorne debuts later this year, Intel will offer multiple SKUs at TDP points that range from 0.5W to 2W and speeds that range from 1GHz to 2GHz. The different SKUs will also support the same platform-level technologies as the mainstream desktop parts; in other words, just as is the case with Intel's desktop and mobile lines, some Silverthorne products will support all of Intel's remote management and virtualization extensions, while others will have a more stripped-down feature set.

Silverthorne will need all of the x86-specific feature-oriented help that it can get, because the competition is tough. It's also the same type of RISC-based competition that Intel already faced and vanquished in the commodity desktop, workstation, and server markets. Specifically, Silverthorne will face off against in-order and out-of-order cores from ARM, specifically the company’s Cortex A8 (in-order) and A9 (out-of-order, multicore) parts. ARM has made some pretty remarkable claims for the A9 in particular, suggesting that the processors will reach speeds north of 1GHz in the same 250mw power envelope as ARM11.

In order to alleviate some of the power difference between its chips and ARM's, Intel has equipped Silverthorne with a new low-power state, called C6. When Silverthorne is in C6, the only components that it leaves turned on are the SRAM that saves the existing processor state and some circuitry that can wake up the processor again when it's needed. (Getting out of C6 takes about 100 microseconds.) Intel claims that their testing indicates that Silverthorne can spend as much as 90 percent of its time in C6; if that's true, then that will bring the chip's average power dissipation far below its stated TDP.

So Intel is counting on a combination of sleep-enabled lower average power and support for the full, awesome expanse of the extended x86 instruction set architecture to make Silverthorne a compelling basis on which to build a generation of mobile internet devices.

Having tried a few of Intel's Silverthorne-based prototypes, I must say that I wasn't particularly impressed. I own a Nokia N800 and an iPhone, both of which are ARM-based and both of which give a nearly complete Internet experience in a smaller form factor than Silverthorne will ever fit into. Indeed, at one point during a sit-down with Intel the rep told me that the warm, bulky prototype I was holding would give me the "full Internet in your pocket." I started chuckling, pulled out my iPhone, and said, "I already have that." He gamely responded that the iPhone's browser doesn't support Flash (in my opinion that's a feature, not a bug), but my point was made.

So Silverthorne is really a transitional product; it's Intel's first, slightly awkward foray into a market that it intends to eventually dominate by doing what it always does, and that's produce ever smaller, cheaper, and faster chips that support the world's most popular ISA. This recipe may ultimately work for Intel in the embedded market the way that it has worked elsewhere, but that day won't come just yet.

Ultimately, Silverthorne could be compelling for the Asus Eee PC form factor, and at 2GHz there's an outside possibility that it might find a home in a MacBook Air that's relatively underpowered, but has great battery life. But the MID form factor, at least in its Silverthorne combination, is dead on arrival. So Silverthorne is just the start of something, and to ARM, MIPS, and the other established chipmakers who currently own the embedded space, it's Intel's way of saying "game on."