Features —

Classic.Ars: An Introduction to 64-bit Computing and x86-64

Ars explains the theory and praxis of 64 bit computing.

AMD's 64-bit alternative: x86-64

When AMD set out to alter the x86 ISA in order to bring it into the world of 64-bit computing, they took the opportunity to do more than just widen the GPRs. x86-64 makes a number of improvements to x86, and in this section we'll look at some of them.

Extended registers

I don't want to get into a historical discussion of the evolution of what eventually became the modern x86 ISA as Intel's hardware went from 4-bit to 8-bit to 16-bit to 32-bit. You can find such discussions elsewhere, if you're interested. I'll only point out that what we now consider to be the "x86 ISA" was first introduced in 1978 with the release of the 8086. The 8086 had four, 16-bit GPRs and four, 16-bit registers that were intended to hold memory addresses but could be used as GPRs. (The four GPRs, though, could not be used to store memory addresses in 16-bit addressing mode.)

With the release of the 386, Intel extended the x86 ISA to support 32 bits by doubling the size of original eight, 16-bit registers. In order to access the extended portion of these registers, assembly language programmers used a different set of register mnemonics. (For more on mnemonics and their relationship to binary opcodes, see this page)

With x86-64, AMD has done pretty much the same thing that Intel did to enable the 16-bit to 32-bit transition--they've doubled the sizes of the 8 GPRs and assigning new mnemonics to the extended registers. However, extending the existing eight GPRs isn't the only change AMD made to the x86 register model.

More registers

One of the oldest and longest-running gripes about x86 is that the programming model has only eight GPRs, eight FPRs, and eight SIMD registers. All newer RISC ISAs support many more architectural registers; the PowerPC ISA, for instance, specifies thirty-two of each type of register. Increasing the number of registers allows the processor to cache more data where the execution units can access it immediately; this translates in to a reduced number of LOADs and STOREs, which means less memory subsystem traffic, less waiting for data to load, etc. More registers also give the compiler or programmer more flexibility to schedule instructions so that dependencies are reduced and pipeline bubbles are kept to a minimum. (For more on dependencies and pipeline bubbles, see this article or this article.)

Modern x86 CPUs get around some of these limitations by means of a trick called register renaming. I won't describe this technique in detail, here, but it involves putting extra, "hidden," internal registers onto the die and then dynamically mapping the programmer-visible registers to these internal, machine-visible registers. The P4, for instance, has 128 of these microarchitectural rename registers, which allow it to store more data closer to the ALU and reduce dependencies. The take-home point is this: of P4's 128 GPRs, only the traditional 8 are visible to the programmer or compiler; the other 120 are visible only to the P4's internal register rename logic, so it's up to the P4's hardware to try and make the best use of them at runtime.

In spite of the benefits of register renaming, it would still be nicer to have more registers directly accessible to the programmer via the x86 ISA. This would allow a compiler or an assembly language programmer more flexibility and control to statically optimize the code. It would also allow a decrease in the number of memory access instructions (LOADs and STOREs). So in extending x86 to 64 bits, AMD has also taken the opportunity to double the number of GPRs and SIMD registers available via the x86-64 ISA.

When running in 64-bit mode, x86-64 programmers will have access to eight additional GPRs, for a total of 16 GPRs. Furthermore, there are eight new SIMD registers, added for use in SSE/SSE2 code. So the number of GPRs and SIMD registers available to x86-64 programmers will go from eight each to sixteen each. Take a look at a diagram from AMD that shows the new programming model:

Notice that they've left the x87 floating-point stack alone. This is because both Intel and AMD are encouraging programmers to use SSE/SSE2 for floating-point code, instead of x87. I've discussed the reason for this before, so I won't recap it, here. And finally, notice that the program counter (PC) is extended. This was done because the PC holds the address of the next instruction, and since addresses are now 64-bit the PC must be widened to accommodate them.