Parallel

A Universal Cross Assembler

By Al Williams, January 28, 2010

Rolling your own cross assembler

The genasm Function

The soloasm.c program calls a function named genasm twice. The first pass receives an argument of 1 and is a "dry run" so that labels can be assigned correct values. The second pass, unsurprisingly, gets an argument of 2 and causes the assembler to actually fill in the correct values into the memory array.

Who writes genasm? You do! The target processor macros do, actually. So, for example, for the One-Der processor, the ORG macro looks like this:

This creates the genasm function and also sets some crucial info in some global variables (that all start with _solo). You can probably guess what each variable does, but just in case you can't see Table 1 for the definitions. Note that you can't use ORG more than once. So soloasm.inc provides REORG in case you want to change the code generation location after the start. Of course, you also have to have an END directive which closes off the genasm function:

That macro puts the right opcode into the array. After genasm returns the second time, the code in soloasm.c emits the array in a form you asked for (Intel hex, raw bytes, etc.). The macros can be arbitrarily complex. For example, the soloasm.inc file makes a special provision that if you set the _SOLO_XSYM define (the shell script passes defines) a symbol listing will appear on the stderr stream.

That's basically it. Your assembly code transforms into macros and those macros form a C function that fills in an array that is dumped out by a boilerplate C program. You can use C expressions just about anywhere you like. For example:

ldi 0xA<<10

You can also pass lines directly to the C compiler by prefixing them with the # sign. This gives you a powerful (though somewhat cumbersome) macro capability:

##define CT 5
# { int i; for (i=3;i<3+CT;i++) {
LDRIQ i,R(i)
# } }

The above snippet will load registers 3 to 7 with the values 3 to 7. You can even define new opcodes this way:

##define MOVE MOV
##define CLEAR(r) MOV(FZERO,r)

The downside is you can't use C reserved words for things like labels. This is a small price to pay for the ease of use. Of course, if you can always use uppercase, make the script force everything to uppercase, or adopt a naming convention (like starting all labels with "_") if you really don't want to reserve the C keywords.

If you study the soloasm.inc file, you'll have a pretty good idea how to create a different target for nearly any common microcontroller or processor. For example, the online listings include targets for the RCA1802 (the first microprocessor I owned) and the Microchip PIC16F84. Note these CPUs are all very different. One-Der is a 32-bit machine, the 1802 is an 8-bit oldie but goodie, and the PIC uses a 14-bit instruction size (pesky Harvard architecture).

Sure, there are some limitations. Obviously if you stopped forcing opcodes to uppercase you could have trouble with conflicting names with C built in keywords (and labels still have that problem, but it is awkward to force arbitrary input to uppercase). If you wanted to compile really large programs (One-Der could address 4GB of memory although I've never built one with anything over 1MB) you'd need to either accept a big memory footprint on the host or adopt some sparse array technique (maybe make the array pointer a list of function pointers that manipulate the sparse array). But for the jobs I ask the assembler to perform, all of this is of no consequence and I've found it a useful tool. I hope you do too.

Of course, once you have an assembler, the next step is a high level-language. Which is why I wrote The Commando Forth Compiler, a lazy Forth cross compiler that is built on top of the assembler and makes programming the One-Der CPU a breeze.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!