Pseudo Assembly Language

Now, when we are done with the file format, we have to define our pseudo assembly language.

This includes both definition of commands and instruction encoding. As this VM is designed to only code/decode short text message, there is no need to develop full scale set of commands.

All we need is MOV, XOR, ADD, LOOP and RET.

Before you start writing macros that would represent these commands, we have to think about instruction encoding.

This is not going to be difficult - we are not trying to be Intel. For simplicity, all our instructions will be two bytes long followed by one or more immediate arguments if there are any.

This allows us to encode all the needed information, such as opcode, type of arguments, size of arguments and operation direction:

typedef struct _INSTRUCTION{ unsigned short opCode:5; /* Opcode value */ unsigned short opType1:2; /* Type of the first operand if present */ unsigned short opType2:2; /* Type of the second operand if present */ unsigned short opSize:2; /* Size of the operand(s) */ unsigned short reg1:2; /* Index of the register used as first operand */ unsigned short reg2:2; /* Index of the register used as second operand */ unsigned short direction:1; /* Direction of the operation *}INSTRUCTION;

It seems to me that there is no reason to list all the macros defining our pseudo assembly opcodes here, as it would be a waste of space.

I will just list one as an example. This will be the definition of MOV instruction:

Constants to be used with our pseudo assembly language Click to enlarge

Macro defining the MOV instruction Click to enlarge

As you can see in the code above, I've been lazy again and decided, that it would be easier to implicitly specify the size of the arguments, rather then writing some extra code to identify their size automatically.

In addition, the name of the instruction tells what that specific instruction is intended to do. For example, mov_rm - moves value from memory to register and letters 'r' and 'm' tell what types of arguments are in use (register, memory). In this case, moving a WORD from memory to a register would look like this:

mov_rm REG_A, address, _WORD

and the whole code section (currently contains only one function) is represented by the image below:

This loads address of the message as immediate value into B register; loads length of the message from address described by message_len into C register; iterates message_len times and applies XOR to every byte of the message. "mov_rmi" performs the same operation as "mov_rm" but the address is in the register specified as second parameter.