Assembly Evolution Part 1: Accessing Memory and the strange case of the Intel 4004

While it has become far less relevant for non-system developers to write assembly than it was a few decades ago, by now CPUs have nevertheless made it much more comfortable to do so. Today we are used to a lot of things: fancy indirect addressing modes with scale, a galore of general purpose registers instead of an accumulator and maybe one or two crippled index registers, condition codes for nearly every instruction (on ARM)...

But also the basics themselves have evolved. Let's take a look at what past programmers had to put up with in entirely simple, everyday things. We'll start with the most trivial: writing to memory.

Our goal is to write a single immediate value of 3 into the memory location 5. In light of paging, segmenting and bank switching, we'll use whatever is convenient as a definition for "memory location". Also, we'll let the CPU decide what the word size should be. Since you only need 2 bits to represent a 3, it should fit with every CPUs word size (except for 1 bit CPUs, which actually existed, but that's a story for another posting). If we have the choice, we'll just take the smallest.

We'll work backwards, from the present to the past, to explore the wonders of direct addressing in Intel CPUs. (One precautionary warning though: I only really tested the 4004 code in an emulator, and my habits are highly tainted by current Intel CPUs. So if I made some mistake somewhere, kindly point it out and I'll fix it!)

x86

On a modern x86 CPU, it is of course fairly easy to write the value 3 to memory cell 5. You just do it:

mov byte [5], 3

A single instruction, simple and obvious. I cheated a bit by not using a segment prefix, nor did I set up any segment registers/selectors beforehand. But assuming a nowadays common OS environment in protected mode, you probably don't want to fiddle with those selectors anyway.

8085

The Intel 8085 is somewhat of a direct predecessor to the 8086, the first in the line of the excessively successful x86 processors. Unlike the 8086, there is no instruction which allows us to move an immediate value to a directly specified address. One way to circumvent this simple limitation is to just use the accumulator to hold our value (thanks to gp2000 for pointing out that there actually are load and store instructions that take a direct 16bit address):

MVI A,3STA 0005h

The amount of instructions with a direct addressing mode is very limited, though. Except for branches, you can't do anything else than loading and storing values.

For any kind of arithmetic, though, you aren't forced to load the value from memory into a register, perform your arithmetic instruction and store it back. Virtually all instructions with a register addressing mode (that is, instruction that directly operate on the contents of a register) also allow register indirect memory addressing through a pseudo register called "M".

This pseudo register is in reality just backed by the registers H and L paired together, each 8 bit wide, and accessing it accesses the memory location they point at (you may take a guess which register receives the High byte, and which the Low byte of the address).

We continue going further back, skipping the 8080 because it wasn't very different in that regard, and arrive at its direct predecessor instead, the Intel 8008. The 8080 and 8085 were source compatible to the 8008 (which, mind you, is not the same as binary compatible... also it may or may not have required some light automated translation), but in the downward direction we have something vital taking from us:
The only instructions that were allowed to contain 14bit (that's the width of its address bus) immediate values at all are jumps and branches. Consequently, we are left with no way to completely specify our destination address in one instruction!

Instead, we have to access H and L, together forming pseudo register M's address, one at a time:

LHI 00hLLI 05hLMI 3

By the way, the 8008 had a 7 level deep internal stack, which I will cover another time. Because of that, and because of the lack of any direct addressing except for branches, it seems that memory is really only ever addressed through either the program counter (for fetching instructions and immediate values after them) or through HL. I wonder if the CPU designers took any shortcuts when designing that in hardware, and whether, for example, HL is always on the bus when the CPU is outside of its instruction fetch cycles?

4004

It's hardly possible to go back further than the Intel 4004, at least if you are only considering single chip CPUs (at the time of its conception in the early 70s, there were already famous multi-chip CPUs with comfortable orthogonal instruction sets, notably the PDPs). Indeed, it was the first widely available single chip CPU. This little thing was a 4-bit CPU with some strange quirks, which we will explore further. Overall, it bears little to no resemblance to its successor in name, the Intel 8008 (except for the internal stack).

But let's just look at the code for writing a value of 3 into the memory location at 5 first:

As a 4 bit CPU, the 4004 has 4 bit wide registers and addresses 4 bit nibbles as words in memory. It has only one accumulator on which the majority of operations is performed, but sixteen index registers (R0-R15).

Those index registers are handy for accessing memory: Besides loading values directly from ROM, an instruction exists to load data indirectly, which sets the address bus to the ROM cell's content. Another instruction performs an indirect jump instead. Other than that, you can just increment index registers, albeit there is the interesting "ISZ" instruction that not only increments, but also branches if the result is not 0.

Because the 4004 uses 8 bits to address the 4 bit nibbles, every two consecutive index registers form a pair, which is then used for memory references.

Note that I explicitly said ROM above. This is because in the 4004 architecture, ROM and RAM are actually vastly different beasts, at least from the assembly programmer's perspective. You can not directly access RAM. It always involves index register pairs, manually sending their content to the address bus (with a strangely named instruction "SRC", which for some reason spells out send register control) and then issuing another instruction which transfers from or to the accumulator.

Interestingly, accessing regular RAM nibbles is not your only choice among the transfer instructions. You can also fetch from and to I/O ports. But the CPU does not have any direct I/O port, instead they are available on both RAM and ROM! You can also read and write "RAM status characters", which to me look like plain regular RAM cells within another namespace. If someone knows, I'd love to hear what they were used for (and if they maybe did behave differently to normal RAM).

Take a look at the data sheet. Within its only 9 pages, the instruction set is depicted on page 4 and 5. Especially in the light that fairly reasonable orthogonal instruction sets appear to have been available in multi-chip CPUs, this first single-chip CPU is clearly a strange specialization towards the desk calculator it was meant for (the Busicom 141-PF). It has the aforementioned index register-centered RAM access, separate ROM (although there is a transfer instruction which strangely refers to some optional "read/write program memory"), a three level internal stack which is almost useless for general purpose programming and a lone special purpose instruction for "keyboard process" (KBP).

15 comments:

The 8085 (and 8080) do have instructions that can reference memory locations directly. Here's one such sequence:

MVI A,3 STA 5

The "5" can be any 16 bit address.

The accumulator can load and store with the BC or DE register pair and there are several other ways to access memory. I'd expect the 8085 to have some internal effective address latches for all these different modes with no special distinction for the HL register pair.

Thanks for pointing that out! At one point, I had different sections for the 8080 and the 8085 in that article, but I collapsed those into one because they were to similar and I got a little confused which addressing mode was available where.

Regarding the 8085 (and maybe the 8008), the memory gets to see the full 16-bits of the address only when the ALE pin indicates. Otherwise the pins that have the lower 8-bits of the address are used for the data bus. Most hardware would use ALE to trigger a latch (external to the 8085) to hold the lsb of the address during a memory read or write. The memory chips that were designed to work with an 8085 had this latch on the chip.

For an example memory fetch, see page 6-17 "Figure 9: Basic System Timing" of the 8085 datasheet. PC[L] is the LSB of the program counter. A8-A15 is the msb of the address. AD0-7 represents the pin used for "address/data".

When you look at the 8080/8085 you can see why we switched en-mass to the Z80 when it arrived on the scene. Its assembly language was much more homegeneous.

LD HL,5LD (HL),3

OR direct to memory...

LD A, 3LD (5), A

Note that all MOV,MVI,STA,LXI nonesense in 8080 is replaced with a single 'LD' mnemonic.

You also asked how the the address in HL was transferred to the address bus. All these processors had a hidden address register (often referred to as MAR, memory access register) which held the mempry address. So for example, during an instrcution fetch cycle the MAR was loadeded with the PC register (program counter), during stack operations the MAR was loaded from the SP register (stack pointer) etc.

Was that similar with the 8008? It's where the question about what happens on the address bus between the instruction fetch cycles came up, because it doesn't have an external stack, for example, and overall seems to have really no other way to access memory than through the instruction pointer or (HL) (correct me if I'm wrong, please).

The 8080(and 8085) actually share the same assembly language with the Z80. The Z80 adds some new instructions but EVERY 8080 opcode is available on the Z80. Zilog wrote their own assembler and changed the mnemonics because Intel had copyrighted them. While the Zilog assembler syntax was different the resulting machine code would be the same. You could write code to run on the 8080 using the Zilog assembler if you lived with only the 8080 subset. TDL labs wrote their own assembler for the Z80 that added Intel style mnemonics and syntax for the new z80 instructions!

Your example isn't great. Memory to memory moves are relatively uncommon, and many many CPUs (including very modern ones) don't have the ability to move a constant value into a constant memory address without putting one or both into (some sort of) a register first. It's almost one of the key points of RISC: one instruction gets to do no more than one memory reference...

Thanks, WestfW, but the point of the article was really more to work towards missing direct addressing modes, followed by the 4004 with its even more laborious way of accessing memory and other quirks. I could have omitted the x86 example and focus on register to memory moves, but I thought it was a nice way to start off with a very un-RISCy current generation CPU.

Could somebody maybe tell my how busicom multiply? Is there any special Assembly Programm for it? When yes how can i get this? I need to analyse this for my essay and i have no clue. Thank you very much!

Nice. Just, calling the 8008 the succesor of the 4004 is a bit misleading. The 4004 (and it's follow up the 4040) are a complete seperate development done by another team with independent designs. The 4004 is a design done by Hoff/Faggin at Intel, while the 8008 is a shrinked implementation of the Datapoint architecture. They are in no way related.

Beside that, it's possible to go further back, at least somewhat: The TMS1000 is an improved version of the TMS0100.

Then again, There is the whole world of 'bigger' machine designs before the 4004.

The PDP-11 had a really wonderful instruction set, and it heavily influenced the language C -- the postincrement and pre-decrement i++ and --i C syntax operations are PDP-11 instructions. The Intel, Fairchild, and MOS single-chip CPUs were big steps backward. If you want to learn the history of assembly language and compiler design, study the PDP-11 heavily.