How I transform Assembly Code into Machine Code for CPU Execution

In today’s blog posting I want to talk a little bit more in detail how I transform my custom written assembly code into machine code for CPU execution. To give you an idea about this challenge, have a look at the following assembly code in my own assembly language (that targets my 8-bit TTL based CPU):

Assembly (x86)

1

2

3

4

MOVD,00000001b

MOVE,10000000b

SHLD

SHRE

This assembly code loads two 8-bit values into the General Purpose registers D and E, and finally performs a SHL/SHR (Shift Left, Shift Right) operation on both 8-bit values. When I assemble this simple program with my Assembler, the output is the following machine code (binary code that can be executed by my CPU):

Assembly (x86)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

11100001//SETA,"0001"

11110000//SETB,"0000“

10001111 // MOV8

11011000 // MOV_ALU_OUT D

11100000 // SET A, "0000"

11111000 // SET B, "1000"

10001111//MOV8

11011001//MOV_ALU_OUTE

10110000//MOV_ALU_INA,D

10000110//SHL

11011000//MOV_ALU_OUTD

10110000//MOV_ALU_INA,D

10001100//SHR

11011000//MOV_ALU_OUTD

As you can see from the machine code, all the mnemonics are just gone, and the output are just zeros and ones who are telling the CPU what to do for each instruction. Before we go into the details of my assembler, we also have to talk about the differences between CISC and RISC instruction sets.

CISC vs. RISC

If you are an Intel guy (like in my case), and you wrote assembly programs (x86, x64), you have always dealed with a so-called Complex Instruction Set Computing (CISC). „Complex“ means that the CPU has to do a lot of work to decode and finally execute the various instruction opcodes. They can be different in their length, they can even store immediate values as part of the opcode itself, etc. Therefore the CPU manufacturer needed really complex circuits to do all this stuff. The CPU dies were growthing in their size, and they also needed a lot of power. The following simple assembly program shows you a CISC instruction that loads some content from memory into the AX register:

Assembly (x86)

1

MOVAH,[SomeMemoryAddress]

It’s just one instruction. Quite simple!

On the other hand, there is also the idea of Reduced Instruction Set Computing (RISC). One very prominent platform that is based on a RISC architecture is ARM. The idea here is that the instruction set itself is simplified. Opcodes normally have the same length, there are no variations in the opcode itself (no immediates, no different addressing modes, nothing). The advantages of this reduction are quite impressive: you need a less complicated CPU circuit, you need less die space, you need less power. Perfect for mobile devices, where power consumption is always a big problem. The following pseudo assembly program shows you a load operation from memory on a RISC architecture:

Assembly (x86)

1

2

3

MOVMAR,SomeMemoryAddress

LOAD

MOVAH,LoadedMemoryContent

As you can see from this example you need 3 RISC instructions to do the same work:

You load the memory address that you want to read into a so-called Memory Address Register (MAR)

The LOAD opcode loads the memory content from the address stored in the MAR into some temporary space

Finally you transfer the loaded memory content into the destination register (AH in our case)

The downside of a RISC architecture is that you can’t generate dense machine code. You just need more assembly instructions (and therefore more RAM) to perform the same thing as on a CISC platform.

What do I use in my CPU?

The question is now what architecture do I use in my CPU project? CISC or RISC? Both! When you write your assembly programs you write them in a CISC language (as you know it from the Intel platform). But the assembler itself transforms the CISC opcodes into RISC opcodes, which can be finally executed by my CPU. Impressive, isn’t it?

When you look back at the beginning of this blog posting, you can see that the CISC assembly code that you write, consists of only 4 instructions. But the generate machine code consists of 14 binary opcodes. That’s the difference between CISC and RISC. It would be awefully complex (at least for me) to implement a TTL based CPU, which can natively work with CISC opcodes...

The various stages of my Assembler

Let’s have now a look at the various stages of my Assembler, and how the CISC code gets transformed into executable machine code.

Caution: I have never ever written a professional Assembler nor a Compiler in my life! So I have (currently) no idea about the used design pattern, and how you structure a Assembler/Compiler. The approach that I describe here is a *simple* straightforward way how *I* currently generate *my* machine code. Never ever do this approach in real life, because I’m only working here with string manipulations – nothing more!

The assembly language (CISC and RISC) is described through 2 ANTLR template files (I’m using Visual Studio 2012 with the .NET Framework and C# as my development environment for the assembler). The following picture shows you some part of the used .g4 file, which describes multiple variations of the MOV instruction.

The ANTLR template files are also used to validate the syntax of the input assembly files, and the generated RISC code. The high level processing pipeline of my Assembler looks like the following:

Run the Preprocessor

Convert CISC instructions to RISC instructions

Generate Memory Addresses

Generate Jump Addresses for conditional and unconditional Jumps

Generate Machine Code

Generate the Arduino Initialization Code

Let’s have now a look at each of these stages.

Preprocessor

Yes, my Assembly language supports Include Files! Let’s have a look at the following assembly code:

Assembly (x86)

1

2

3

4

5

6

7

8

9

10

11

12

#INCLUDE"FUNCTIONS.inc"

; Initialize the stack pointer and the base pointer

MOVXL,0xFF

MOVXH,0xFF

MOVSP,X

MOVXL,0

MOVXH,0

MOVBP,X

CALL:_MAIN

HLT

As you can see from the previous listing, I’m including here a common file called FUNCTIONS.inc. During the pre-processing phase of the assembler I’m just generating a new Assembly File, where I’m including in-place the instructions which are stored in the various Include Files. Nice 🙂

Converting CISC to RISC

Now the real fun begins: the conversion of CISC instructions to RISC instructions! As you know CISC instruction consists of multiple simpler RISC instructions, which perform in combination to each other the same thing as the equivalent CISC instruction. When I have thought about that challenge, my mind came up with the following idea:

My CISC instructions are only macros. Macros – nothing more!

Therefore the assembler just has to expand these „macros“ to convert CISC instructions to RISC instructions. Easy, isn’t it? And that’s the way how I do it. The following C# Code shows you how I convert a "MOV D, [MemoryAddress]" CISC instruction into multiple RISC instructions:

As I have already said previously, I’m only working here with string manipulations. I don’t have any ASTs, DAGs in place – nothing. Just string manipulations. As easy as possible. The variable "assembly" is just a global variable of the type "List", which stores the final RISC converted instructions. (the token ";;" is the start of a comment in my assembly language). The generated RISC instructions are afterwards written again into a file (perfect for troubleshooting!). In our case the generated RISC instructions for the initial 4 CISC instructions constains the following content:

Assembly (x86)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

SETA,"0001"

SETB,"0000“

MOV8

MOV_ALU_OUT D

SET A, "0000"

SET B, "1000"

MOV8

MOV_ALU_OUTE

MOV_ALU_INA,D

SHL

MOV_ALU_OUTD

MOV_ALU_INA,D

SHR

MOV_ALU_OUTD

This code is still readable and understandable by a human, but a CPU has no idea how to execute the various RISC mnemonics. Therefore we have to translate now our RISC instructions to the binary opcodes, which are finally understandable and executable by the CPU itself. But before we generate the binary opcodes we need memory addresses!

Generating Memory Addresses

Therefore the next step in my assembler is now the generation of memory addresses and assigning a memory address to each RISC instruction.

Caution: I don’t have yet any program loader in place, I don’t have any memory layout, nothing. Therefore every binary code is loaded at the starting address 0x00, and program execution also starts there. I have currently a simple flat-memory model!

My memory addresses are currently 16 bits long (64K address space), but the CPU is designed in a way to extend the address registers to 24 bits, so that I finally will have an address space of 16MB. During the memory address generation I’m just looping over the List variable and assign sequentially to every RISC instruction a memory address. The result of this phase looks now like the following:

Assembly (x86)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

0000000000000000SETA,"0000"

0000000000000001SETB,"0001"

0000000000000010MOV8

0000000000000011MOV_ALU_OUTD

0000000000000100SETA,"1000"

0000000000000101SETB,"0000"

0000000000000110MOV8

0000000000000111MOV_ALU_OUTE

0000000000001000MOV_ALU_INA,D

0000000000001001SHL

0000000000001010MOV_ALU_OUTD

0000000000001011MOV_ALU_INA,D

0000000000001100SHR

0000000000001101MOV_ALU_OUTD

The first colon is just the 16 bit long memory address.

Generating Jump Addresses

Things are getting now complicated if your assembly code contains jumps (conditional and unconditional), because the jump opcode has to know where to jump. The destination memory address is stored in a register called J. But during the RISC code generation I don’t yet know (which happens earlier in the pipeline) the target memory address of the jump. Therefore I introduce a place-holder value during the RISC code generation, which is now finally changed to the real memory address where the jump goes to. To better understand this approach, let’s have a look at the following CISC assembly code:

Assembly (x86)

1

2

3

4

5

6

7

8

:START

MOVD,10101010b

MOVE,11110000b

; Jump back to the beginning of the pogram

JMP:START

HLT

Yes, my assembler even supports symbolic labels for jump destinations! The generated RISC code for this program looks the following:

Assembly (x86)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

NOP:START

;; BEGIN MOV D, 10101010b (MOV_CONST_BINARY)

SETA,"1010"

SETB,"1010"

MOV8

MOV_ALU_OUTD

;; END MOV D, 10101010b (MOV_CONST_BINARY)

;; BEGIN MOV E, 11110000b (MOV_CONST_BINARY)

SETA,"1111"

SETB,"0000"

MOV8

MOV_ALU_OUTE

;; END MOV E, 11110000b (MOV_CONST_BINARY)

;; BEGIN JMP :START

SAVE_FLAGS

SETA,"1111":START_LN2

SETB,"1111":START_LN1

MOV8

MOV_ALU_OUTXL

SETA,"1111":START_HN2

SETB,"1111":START_HN1

MOV8

MOV_ALU_OUTXH

MOV16J,X

RESTORE_FLAGS

JMP:START

;; END JMP :START

HLT

As you can see from this code, I’m using here the labels :START_LN2/LN1/HN2/HN1 as markers where the real memory address must be placed. My CPU can only load a 8 bit value with two instructions (lower nibble, higher nibble) into a target register, because the 4-bit long nibble must be also encoded into the 8-bit long opcode. Therefore I need 4 SET opcodes to load the 4 nibbles of the 16-bit target memory address. And these place holder values are now replaced with the real target memory addresses, which I finally know that this stage in the assembly pipeline (I’m just performing a lookup with the jump label to get the real memory address).

Generating Machine Code

After the generation of the memory addresses and the fixing of the jump destinations we are now ready to translate the RISC instructions into binary opcode values. I’m using here again ANTLR to perform the translation. Let’s have a look at the following C# code, which generates the binary opcode for the SHL instruction:

It’s quite simple: every RISC instruction has its corresponding binary opcode that gets finally decodeded and executed by the Instruction Decoder of the CPU. The generated file is now the code that can be executed on the CPU.

Assembly (x86)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

;; BEGIN MOV D, 00000001b (MOV_CONST_BINARY)

000000000000000011100000; SET A, "0000"

000000000000000111110001; SET B, "0001"

000000000000001010001111; MOV8

000000000000001111011000; MOV_ALU_OUT D

;; END MOV D, 00000001b (MOV_CONST_BINARY)

;; BEGIN MOV E, 10000000b (MOV_CONST_BINARY)

000000000000010011101000; SET A, "1000"

000000000000010111110000; SET B, "0000"

000000000000011010001111; MOV8

000000000000011111011001; MOV_ALU_OUT E

;; END MOV E, 10000000b (MOV_CONST_BINARY)

;; BEGIN SHL D (SHL_R8)

000000000000100010110000; MOV_ALU_IN A, D

000000000000100110000110; SHL

000000000000101011011000; MOV_ALU_OUT D

;; END SHL D (SHL_R8)

;; BEGIN SHR E (SHR_R8)

000000000000101110110001; MOV_ALU_IN A, E

000000000000110010001100; SHR

000000000000110111011001; MOV_ALU_OUT E

;; END SHR E (SHR_R8)

Generation of the Arduino Initialization Code

This code is now nice, but the real question is now to write this binary code into the SRAM memory chip of the CPU for execution? My answer to this question is quite simple: I’m using an Arduino Board (hooked up to the Address- and Data-Bus of the CPU) to do the required initialization during the startup-phase of the CPU. Thefore the final last step of the assembler is the generation of the C Arduino code that performs the SRAM initialization. The resulting C code looks like the following:

Summary

This blog posting gave you an overview how I generate machine code from my custom assembly language. It’s a long complicated process, but it works. As I have said in the beginning, I’m not using here a quite sophisticated approach. I mainly perform string manipulations, nothing more. But I have already the Dragon book in my book library, and looking forward to read it.

In the future I will also retarget a C compiler (LCC), which will generate CISC assembly code. This would mean then, that I can program my own CPU with a high level programming language. Or I take the other route and implement an AOT (Ahead of Time) compiler where I convert Microsoft CIL code to my CISC assembly code – who knows...