Willamette Performance Revealed

Birth of a New Microarchitecture

Today Intel officially introduced its 7th generation x86 core, their first entirely new design in five years. Given the code named “Willamette” while under development, it will marketed under the name Pentium 4 (P4). This spanking new microprocessor will naturally be compared to the P6 core – the design it is destined to replace and the basis of the Pentium Pro, Pentium II, and Pentium III (PIII). And it will quickly be sized up with respect to its primary competition, the AMD K7 and its anticipated future descendants.

The announced performance of the Willamette P4 measured with the SPEC2000 benchmark suite is shown in Table 1 along with official scores for the Coppermine Pentium III and Thunderbird K7 Athlon. There have been slightly higher (and presumably newer) SPEC scores for the 1.4 and 1.5 GHz P4 disclosed by sources ostensibly violating their non-disclosure agreement (NDA) with Intel but given the uncertain legitimacy of those values I have chosen to use the official P4 performance numbers [1].

Table 1 Performance of P4, PIII, and K7 Measured by SPEC2000

CPU

Freq

Chipset

Compiler(s)

Absolute
Performance

(MHz)

SPECint2k

SPECfp2k

peak

base

peak

base

P4

1500

850

IRC 5.0

535

522

558

549

P4

1400

850

IRC 5.0

509

499

538

529

PIII

1133

820

IRC 5.0

464

461

331

320

PIII

1000

840

IRC 5.0

442

438

335

327

PIII

1000

820

IRC 5.0

428

426

314

304

PIII

933

820

IRC 5.0

410

407

305

295

PIII

867

820

IRC 5.0

390

388

294

284

PIII

800

820

IRC 4.5

355

352

256

245

PIII

800

440BX2

IRC 4.5

344

340

237

226

PIII

733

840

IRC 4.5

–

336

–

243

PIII

733

820

IRC 4.5

335

331

244

234

PIII

750

440BX2

IRC 4.5

330

325

230

219

PIII

700

440BX2

IRC 4.5

315

310

223

213

PIII

667

820

IRC 4.5

314

310

233

222

PIII

650

440BX2

IRC 4.5

299

295

215

204

K7

1200

GA-7ZM

IRC 4.5/Compaq
6.5

–

–

342

304

K7

1100

GA-7ZM

IRC 4.5/Compaq
6.5

–

–

331

311

The 1.4 GHz P4 achieves 19% higher SPECint2000 performance than a 1000 MHz PIII on an 820 based platform, while the 1.5 GHz P4 achieves 25% higher performance. Considering that the P4 is clocked 40% or 50% faster than the PII, this would at first glance seem to confirm the concern that the P4’s deep pipelining and small 8 KB data cache would cause a significant instruction per clock cycle (IPC) penalty compared to the PIII. Does this mean the P4 is a “bad design” or that Intel has cut corners in their new core in a blatant attempt to trick computer buyers on the basis of high clock rates? Simple comparison of performance and clock rate cannot be used to support those contentions, as clock normalization is not a valid way to compare two microarchitectures operating at different clock frequencies.

For starters, SPEC2000 was designed to have a much larger memory footprint than SPEC95. As a result, memory accesses miss in the 256 KB L2 cache found in both the P4 and PIII in significant numbers, and have to be satisfied by read and/or write operations to main memory. As the clock rate of a PIII or P4 processor is increased, the number of processor cycles needed to access main memory (which doesn’t speed up) increases. An average memory access might take a 100 ns or more. That translates to 100 clock cycles on a 1.0 GHz PIII and 150 clock cycles on a 1.5 GHz P4. That is fair for looking at absolute processor performance because after all, that is how the processors are intended to run. But if you are trying to compare the design efficiency of the two different microarchitectures you wouldn’t run them at the same frequency and then connect the second to memory 50% slower than the memory connected to the first. That is effectively what you are doing when you compare the performance divided by clock rate (i.e. clock normalized) of two designs with one run at 1.0 GHz and the other at 1.5 GHz.