Into the Core: Intel’s next-generation microarchitecture

Earlier this year at its Developer Forum, Intel unveiled Core, the next- …

Core's execution core

In addition to its greatly enlarged reservation station (32 entries), Core's execution core has a new issue port scheme, with a six issue ports (compared five for P6 and four for Netburst). Unlike its predecessors, the Pentium 4 and the P6, Core has three issue ports dedicated to arithmetic and logical instructions. The present section will look in as much detail as is currently possible at the execution hardware attached to each of Core's ports.

Core's execution core

Note: Intel has yet to release the exact issue port assignments for Core's execution core, so my coverage in this section relies to some extent on my own detective work. Specifically, in the execution unit functional breakdowns below the main diagram, the functions in italics are more speculative and await confirmation from official Intel docs.

Integer execution units

Core has three 64-bit integer execution units, each of which can do single-cycle 64-bit scalar integer operations. It appears that there's one 64-bit complex integer unit (CIU), which does most of the same work that as the P6 core's CIU, and two simple integer units (SIUs) that do basic operations like addition. One of the SIUs shares port 2 with the branch execution unit (BEU, which Intel calls the jump execution unit). The SIU on this port is capable working in tandem with the BEU to execute macro-fused instructions (compare or test + jcc).

The ability to do single-cycle 64-bit integer computations is a first for Intel's x86 line, and this feature puts Core ahead of even IBM's PowerPC 970, which has a two-cycle latency for integer operations. Furthermore, because the 64-bit integer ALUs are on separate issue ports, Core can sustain a total throughput of three 64-bit integer operations per cycle.

All told, Core has a robust integer unit that should serve it well across the very wide range of applications (mobile, server, gaming, etc.) that the architecture will be expected to run.

Floating-point execution units

Core has two floating-point execution units that handle both scalar and vector floating-point arithmetic operations. The execution unit on port 1 handles floating-point adds and other simple operations in the following data formats:

Scalar: single-precision (32-bit), double-precision (64-bit)

Vector: 4x single-precision, 2x double precision

The floating-point execution unit on port 2 handles floating-point multiplies and divides in the vector and scalar formats listed above.

Note that in my Core diagrams I've depicted the FADD/VFADD and FMUL/VFMUL pipes as four separate blocks for clarity's sake. The pairs are colored alike, though, to show that the FADD shares hardware with the VFADD, and the FMUL shares hardware with the VFMUL, with the result that these four blocks should really be considered as constituting two pipelines.