lcc for Magic-1

To build your own, simply grab the lcc 4.2 distrubution
here (and buy the book
if you're interested in compilers), edit bind.c to
add an m1 target as follows:

xx(m1, m1IR) \

and add my machine description file, m1.md, to
the src directory (renamed "m1.md", and with appropriate makefile additions).
I made some local modifications to the IR. If you don't have my other
changes (done for D16/M), delete or comment out the "/* byte width */" line in
the IR declaration. Then, build lcc and invoke the m1 backend as follows:

lcc -c -S -target=m1 testfile.c

The assembly output will show up in testfile.asm (though I'm going to change
that from x.asm to x.s soon).

Retargeting Notes

Following are some random thoughts, problems and solutions encountered during
the retargeting. Thanks go to Richard Man for his suggestions about better
handling 8-bit operations.

Bugs?

7/4/2007 - Fixed what I believe to be a bug in the handling of ANSI integral
promotion. There are some odd rules about promoting unsigned integer types
to signed in ANSI C that are somewhat different than standard C.
When an unsigned short type is widened to an int, it is changed from unsigned to
signed. LCC does this - but when the sizeof(short) is the same as
sizeof(int), the change of sign should not happen. LCC made this change
anyway, which broke some code in Minix (and pretty much rendered unsigned short
useless). My fix was to add an if clause to binary() that allows an
unsigned short to remain unsigned iff short and int widths are the same.

Sizes

I'm going with a 16-bit int as my basic inttype. Char is 1, short is 2,
long is 4 and long long is also 4. On the floating point side, I've chosen
to make all of them 4 bytes. I don't expect to make much use of
longs & floats, so I'm not all that concerned about efficiency. In lcc, I
emit pseudo-ops for all 4-byte data types, which via assembler macros will be
converted into combinations of inline code and "millicode" calls. The
latter term comes from my HP days, and refers to runtime support subroutine
calls that follow a lightweight calling convention. In my case, the
convention is no frame built, incoming parameters in a & b as either int data
(for conversions) or pointers to the operands in memory. Because I've only
got two useful registers here, operations such as add, sub, etc will treat
operand 1 as both an input and the result.

As far as the other flags in the interface record, I'm asking for ARGB and
CALLB (which means I'm explicitly handling structures as arguments and function
results), big-endianness, left-to-right argument evaluation and signed chars.

Register usage

lcc was really designed for RISC-like architectures with lots of orthoganal
registers. Register usage presents a bit of a challenge for m-1.
We declare 2 16-bit registers, a and b, and tell lcc they are both to be used
for temps, and that no registers are to be used for register variables. We
also tell lcc that we have 4 32-bit integer registers, and 4 32-bit floating
point registers. Of course, m-1 doesn't actually have these.
Instead, we allow lcc to generate code using them, but we will end up mapping
these virtual registers to frame locals. The interesting code to handle
this deception is in function(). After code gen (but before code emit), we
look to see which, if any, of these virtual registers have been used. For
those that have been used, we allocate space in the frame for them and then
change the name of the register to actually be the stack offset of the location.
For example, when we first set up the virtual registers, we create a register "R_L0".
If that register is actually used, we'll change it's name to "50(sp)" - or
something like that depending on the actual offset. All long/float
assembly instructions are generated as something like:

ADD32 L1,L0

which get emitted as something like:

ADD32 24(sp),28(sp)

which the macro processor will convert into:

lea a,24(sp)
lea b,28(sp)
call $add32

8-bit operations

M1 has native support for both 8 and 16 bit operations. lcc always generates
code to convert data to it's supertype prior to arithmetic operations.
This means code generated for char operations has lots of sign extends, and
would never use my 8-bit operations at all (always extending to 16 bits).
A partial solution for this is to have conversion rules that simply pass along
their input operand, doing nothing. However, sometimes you really need the
extends, so we have the notion of two intermediate rule types: reg and reg8.
Here are some sample rules:

In short, we have rules which suppress the sign extends producing a "reg8"
result. All 8-bit operations then require reg8 operands and return reg8
results. If something actually does need to be extended, it will use the
rule which produces the "reg".

The big problem with this scheme is that when we suppress the CVII2, we lose
our register targeting. This may simply be a problem of my
misunderstanding and there's a better solution, but it seems to work reasonably
well to have target() also target the kids operand if it is a CVII2 or any other
operator we may suppress. For example:

In the above code snippet, we naturally make sure both the left operand and
target are register a (which M-1 requires) and also if the left operand is the
result of a conversion operator, we make sure that the conversion operator's
input is also register a. If anyone knows of a better solution, please
email
me.

Compare & branch

One of my favorite features of M-1 is the rich set of compare and branch
instructions. These are a natural match for lcc, which uses compare and
branch operators. Because of the space they occupied in my 1-byte opcode
encoding, I couldn't afford to have all possible conditions and sizes.
Equality and non-equality work just fine for either unsigned or signed data, and
I fully support these. For >, <, >= and <=, though, I only directly suport
< and <= for signed operations. By reversing the compare-and-branch
operands when dong a > or >=, I can fully support signed compare-and-branches
(note: "a >"b is the same as "b < a" and "a >=
b" is the same as not "b <= a").
So, my rules for GT and GE reverse the operands and map to LE and LT for all
data types.

Note that M-1's cmpb instructions only permit a 1-byte signed branch target
displacement. If the branch is too far, we need to do a regular cmp op
followed by the appropriate branch on condition code (br.eq, br.ne, br.lt,
br.le, br.ltu, br.leu). For unsigned lt,gt,ge,le, we would always want to
generate the long-form cmp/branch sequence. The problem is that our
compiler isn't going to know whether the branch will reach. That is going
to be known by the assembler. The solutions include emitting macros for
compare-and-branches and allow the assembler to choose, and to always emit the
worst case and allow the assembler to optimize the to the short cases when
possible.

I decided to do something similar to the first option. I'll always emit
a cmpb instruction - even for the unsigned cases in which there really isn't a
cmpb to map to. In M-1 ISA, I have a lot of variant forms of instructions
- long versions for big immediate and short versions for short immediates.
I've decided to treat the cmpb in a similar fashion. I can pretend that I
have cmpb forms for everything. The assembler will then choose the optimal
template based on immediate size. Even though the long form for cmpb is
actually two separate instructions, I will just create a template that contains
both the cmp and br.

Note that usually you'd want to emit worst-case and then optimize, but I
decided the output assembly file would be easier to read if I used the cmpbs.

Segments

I ran into a bit of trouble with dealing wiith code, lit, bss and data
segments. Ideally, I would have liked to combine code and lit and
have them mapped to a process's code page table, and combine data and bss mapped
to the data page table. My problem is that I also wanted to emit
position independent code, and to be able to have multiple processes map to the
same code/lit at possibly different virtual addresses. This mean that I
can't have embedded absolute code & literal addresses in the code or lit
segments. You must still support this, but by ensuring that all such
initialized structures live in the data segment where load-time fixups can set
up the correct values.

It turns out that lcc does emit absolute code addresses in lit for its switch
statement branch tables. Rather than rewrite that code to emit pc-relative
offsets, I just decided to combine lit, bss and data and put them in the data
side of the house. Inelegent, but it works. Perhaps I'll revisit
that later.

However, if I do revisit the problem later, I'll have to make sure I can tell
which segment a symbol is located within by looking at it's symbol table entry.
This is probably easily doable, but I did run into some problems figuring it
out. There is a seg field, p->u.seg, but it lives in the union space of
the record and isn't always valid. The reason for needing to distinguish
is that I must generate different code to materialize an address in code space
(lea r,xx(pc)) vs. data space (lea r,xx(dp)).

Calling convention

Handling my calling convention in lcc was pretty simple - what I wanted to do
looked a lot like that lcc assumed. I thought about passing the last
incoming argument in via register a, but decided against it (just a bit messy,
and not much upside). So, all incoming arguments are stored in the
caller's frame. Function return values come back in register a (for 8/16
bits) or registers a & b (for 32 bits). In the latter case, register a
will contain the least significant word, and register b the msb.

Handling this in lcc was simple - nothing unusual needed.

Macros

I intend to handle a lot of details using macros and a smart macro assembler.
Until I write it, I'm just using m4. So, at the moment I have lcc generate
a bunch of m4 macrdos at the head of each file.