Retrocomputing Stack Exchange is a question and answer site for vintage-computer hobbyists interested in restoring, preserving, and using the classic computer and gaming systems of yesteryear. Join them; it only takes a minute:

On x86 the first four general-purpose registers are named AX, CX, DX, BX. It would be quite intuitive if their indices (those used in instruction encoding) were in alphabetical order, but instead of ABCD we actually have ACDB. E.g. mov bl, 1 is encoded as B3 01 while mov cl, 1 is B1 01.

Is there any reason why they weren't enumerated in alphabetical order?

You bring as Exhibit A the x86 architecture, and you're asking why it lacks in consistency in mapping between register mnenomics and opcode binary values? :-) Remember, at that time, a new CPU was allowed to be much more of a clean break with the past than what we might expect today. Just look at the interaction between the 80286's real and protected modes (which Intel did fix in the 386 especially by introducing V86 mode).
– a CVnDec 1 '17 at 15:19

6

The AX/CX/DX/BX order also makes an appearance in PUSHA, which suggests it might correspond to the internal register file implementation...
– Stephen KittDec 1 '17 at 15:31

22

i always learned these registers as accumulate, count, data, and base. They weren't ordered alphabetically so much as ordered by usage, ax for most arithmetic operations, cx for loop counters, dx for either left over arithmetic (think of the remainder or carry for div/mul) or i/o data, and bx for a base pointer to memory. Roughly, the ACDB is the order of importance for your average use case
– Steve CoxDec 1 '17 at 18:54

3

@SteveCox actually in the time we're talking about guys writing the instruction set would prioritize minimal number of gates and then minimal gate depth/delay over just about everything else, I would think. So I wonder if there's an explanation in considering the (non-microcoded) instruction decode logic w.r.t. the special purposes of each "general" register.
– davidbakDec 1 '17 at 23:36

4

Consider looking at 8080 and 8086 opcodes since those are the ancestors of x86.
– Thorbjørn Ravn AndersenDec 2 '17 at 23:53

2 Answers
2

There are no technical reasons, as any order would work and result in the same amout of gates. More likely it is originated in process the 8086 was developed. A main goal was to allow easy conversion of 8080 programs, so the development of the 8086 structure started out from a 8080 programming model. 8080 registers where ordered as 16 bit pairs in the sequence of PSW, A, B/C, D/E, H/L (SPand PC) with HL being the general base or memory pointer and DE being a backup (*1). So assigning them in the same order with similar functionality will result in

PSW/A becomes AX, and AL is also the general purpose 8 bit accumulator,

B/C becomes CX, as these where the general purpose, usually counter registers

D/E becomes DX, as general purpose 16 Bit pair, and finally

H/L becomes BX, as the primary pointer register.

I wouldn't be much surprised if early documents would reveal 8080-like names used during development.

Remember, while 8086 registers are more adpted for general use than in the 8080, they had dedicated functions like BX in addressing, and/or optimized coding for certain applications - like AL/AX as primary accumulator. This provided a way for more compact coding and thus faster execution of programms acknowledging these differences.

More important it also gave programs, lifted by automated translation utilities (like Digital Research's XLT86) from 8080 assembler sources to 8086 assembler, an encoding comparable compact to 8080 code. An important argument toward early adopters, as memory requirements where still a major cost factor back then, and asking them to buy a CPU that needs bigger ((E)P)ROMs than the existing one didn't come over well.

*1 - The physical 8080 model is different, as PSW and A where not part of the register file, which in addition featured a W/Z pair for internal operation (like buffering memory access, or addresses during 16 bit operations)

There is also a slightly mnemonic quality to some of the register names: Accumulator, Branch control (B as used in the Z80 instruction DJNZ), Count as in the CX register used by the x86 LOOP instruction and REP modifiers, and a general-purpose Data register. I would assume the naming of H:L is simply based on "high" and "low".
– njuffaDec 2 '17 at 15:58

@njuffa: So BX / "base" may have been named just to fit into the alphabetic pattern. Other letters could have been chosen, like "P" for pointer or T for table, but weren't. Similar to how the new segment regs in 386 were named FS and GS (meaningless letters), mostly to follow 8086's SS (stack) / CS (code) / DS (data), ES (extra? used implicitly only for string / rep-string insns).
– Peter CordesDec 3 '17 at 4:25

1

@PeterCordes x86 has SP (stack pointer) and BP (base pointer) registers, so the 'P' for pointers is there in some sense. I agree that BX as an indexing base, to be used in conjunction with the SI (source index) and DI (destination index) registers is also mnemonic.
– njuffaDec 3 '17 at 5:01

@njuffa: Funny to imagine if we'd had PX and PA / PH, and then later EPX / RPX. Then 386 setcc would have given us instructions like setp ah and seta ph. >.< And 8086 already had PF (as a bit in FLAGS). I'm glad they (or Stephen Morse specifically) chose B for base :)
– Peter CordesDec 3 '17 at 5:22

Well, not all his wished came thru, as for example he called what later became Convert Byte to Word in his papers Signe EXtend
– RaffzahnDec 3 '17 at 13:18

In the time we're talking guys writing the instruction set would prioritize minimal number of gates and then minimal gate depth/delay over just about everything else, I would think.

BX is the only one of ACDB that can be used in a 16-bit addressing mode:[ (BX|BP) + (DI|SI) + (0 | disp8 | disp16) ], where any of the 3 components are optional.

BX is also the only 16-bit-addressing-mode register that has a low/high half. So maybe in 8086 it was physically on the boundary between the split low/high registers and the address-capable registers that the AGU had to read.

(off topic re: low-8 of other registers) In x86-64, a REX prefix changes the meaning from AH/CH/DH/BH to SPL/BPL/SIL/DIL, in that order (Intel manual vol.2, Appendix B.1.4.2, Table B-5). In 16/32 bit modes, 16-bit operand size was the smallest for SP/ESP and the other non-X registers. (Making registers more uniform helps compilers, except that compilers sometimes end up wasting a REX prefix by picking a register that needs one to access the low 8 component.)

The pusha/popa ordering matches too, and while that's interesting, the internal implementation probably uses a counter and goes through the same fetch-by-index logic as explicit register operands. So it doesn't add new information that pusha goes in order of the encodings.

It makes sense that the physical layout of the register file would match the register-number encodings, though, to keep the decoding logic simple. I had a look at the addressing-mode encodings, to see if there was a similar pattern there. (I haven't looked at 8086 gate diagrams, but maybe the AGU fetches directly from the last 4 registers in the register file without going through the full indexing that can select any of the 8.)

It's complicated by the fact that 16-bit doesn't have a SIB byte, so base and base+index modes share the same 3 bit R/M field in the ModR/M byte.

So BX alone is at the opposite end from BX+SI and BX+DI, but it's one wrap-around away from being adjacent to the other two codes that involve BX. When SI and DI are involved, bit0=0 means SI, bit0=1 means DI.

When there are both base and index registers (bit2=0), then bit1=0 means BX, bit1=1 means BP. So that's maybe consistent with BX being earlier in the physical layout of the register file (lower numbers to access it). But R/M fields clearly need significant decoding before they turn into register fetches.

Still, I think it's plausible that the AGU in 8086 has a "back door" into the register file that can only select from the last 4 registers (in /r field and pusha encoding order). Note that 8086 uses the adder in its regular ALU for address calculations, but the addressing-mode decoding hardware might use different paths to fetch inputs for the ALU's address calculations. (Totally guessing; certainly possible it just decodes those addressing modes to the usual 3-bit register codes and drives the normal ALU through regular register-fetch paths.)

In 32/64-bit addressing modes, the encoding matches the usual register encoding, so presumably (in CPUs without out-of-order / register renaming, like 80386) the AGU can access the register file with the same 3-bit address as the ALU.

(fun fact, 32/64-bit shares the same pattern of [e/rbp] being the escape code for disp32 with no base (or RIP relative), which is why EBP always needs at least a disp8 = 0. This is why disassembly looks like [ebp+0] vs. [edx].) Same for r13 in x86-64 mode, because it has the same code as rbp (except for the extra bit in the REX prefix).

Nice work, just it begins with a little flaw: Stephen Morse, who was the lead architect and single handed wrote most specs, was a software guy. He did 'invent' the modr/m byte as well as seting the opcodes to be used, before any hardware at such a detail level needed for above reasoning was defined, even less designed. Reading his 1980 book '8086 Primer' gives some insight in the way he saw the CPU. It also describes part the reasoning for the reg field the OP asked about. Last but not least, when looking at other modr/m encodings, above theory would result in more gates.
– RaffzahnDec 2 '17 at 14:03

4

The 8086 Primer is BTW a very important book for anyone who wants to understand the 8086. The same way Stephen Morese development of the 8086 is special, as he is not a hardware guy, and wasn't involved in any CPU project before or after the 8086 (not even the 8088), the book is special, as the description of the 8086 he gives differ in structure and wording quite from all Intel material. The way it's written reveals a lot about his view when designing and what parts he emphased on.
– RaffzahnDec 2 '17 at 14:20

@Raffzahn: More gates for the AGU to have a back door into the register file? Yeah, I wondered about that; fetching through the regular indexing logic would presumably be fewer gates. Maybe more total gates on chip, but fewer gate-delays in some of the critical paths? Thanks for the info that the ModR/M encodings were set before HW design by a software guy. I'm a software guy, too, though, and I thought of this, so perhaps he did consider BX as on the boundary between data and pointer.
– Peter CordesDec 2 '17 at 14:28

2

I wonder how the cost of the 8x88's approach compared with the cost of something like the 6809's approach which could IIRC use some bits of an extension byte as part of an offset and others for register/segment selection. In many programs, the fraction of instructions that would need an ES: prefix is much higher than the fraction of instructions that would need a displacement of +/-32..127, or beyond +/-16384, so adding an extra byte to instructions needing a longer displacement would be a small price to pay for having some addressing modes imply ES: [the way BP implies SS:].
– supercatDec 5 '17 at 21:30