Added comment to C_CALLS signature documenting the responsibility of the
client to handle sign extension. Also added a "naturalIntSz" value
that the client can use to determine the integer promotion size.
Updated the implementations to match the changed signature and removed
sign extension from the x86 implementation.

Tested the jump chain elimination on all architectures (except the
hppa). This is on by default right now and is profitable for the
alpha and x86, however, it may not be profitable for the sparc and ppc
when compiling the compiler.
The gc test will typically jump to a label at the end of the cluster,
where there is another jump to an external cluster containing the actual
code to invoke gc. This is to allow factoring of common gc invocation
sequences. That is to say, we generate:
f:
testgc
ja L1 % jump if above to L1
L1:
jmp L2
After jump chain elimination the 'ja L1' instructions is converted to
'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
up being implemented in their long form (if L2 is far away) using:
jbe L3 % jump if below or equal to L3
jmp L2
L3:
...
For large compilation units L2 may be far away.

In order to support the block placement optimization, the first
cluster that is generated (called the linkage cluster) contains a jump
to the entry point for the compilation unit. The linkage cluster
contains only one 'function', so block placement will have no effect on
the linkage cluster itself, but all the other clusters have full
freedom in the manner in which they reorder blocks or functions.
On the x86 the typical linkage code that is generated is:
----------------------
.align 2
L0:
addl $L1-L0, 72(%esp)
jmp L0
.align 2
L1:
----------------------
72(%esp) is the memory location for the stdlink register. This
must contain the address of the CPS function being called. In the
above example, it contains the address of L0; before
calling L1 (the real entry point for the compilation unit), it
must contain the address for L1, and hence
addl $L1-L0, 72(%esp)
I have tested this on all architectures except the hppa.The increase
in code size is of course negligible.

Compilers that generate assembly code may produce global labels
whose value is resolved at link time. The various peephole optimization
modules did not take this in account.
TODO. The Labels.addrOf function should really return an option
type so that clients are forced to deal with this issue, rather
than an exception being raised.

Pulled out various utility modules that were embedded in the modules
of the register allocator. I need these modules for other purposes, but
they are not complete enough to put into a library (just yet).

A bug fix from Allen.
A typo causes extra fstp %st(0)'s to be generated at compensation
edges, which might cause stack underflow traps at runtime. This
occurs in fft where there are extraneous fstps right before the 'into'
trap instruction (in this case they are harmless since none of the
integers overflow.)

1. Since COPY instructions are no longer native to the architecture,
a generic functor can be used to implement the expandCopies function.
2. Allowed EXPORT and IMPORT pseudo-op declarations to appear inside a
TEXT segment.

Removed the native COPY and FCOPY instructions
from all the architectures and replaced it with the
explicit COPY instruction from the previous commit.
It is now possible to simplify many of the optimizations
modules that manipulate copies. This has not been
done in this change.

Changed the representation of instructions from being fully abstract
to being partially concrete. That is to say:
from
type instruction
to
type instr (* machine instruction *)
datatype instruction =
LIVE of {regs: C.cellset, spilled: C.cellset}
| KILL of {regs: C.cellset, spilled: C.cellset}
| COPYXXX of {k: CB.cellkind, dst: CB.cell list, src: CB.cell list}
| ANNOTATION of {i: instruction, a: Annotations.annotation}
| INSTR of instr
This makes the handling of certain special instructions that appear on
all architectures easier and uniform.
LIVE and KILL say that a list of registers are live or killed at the
program point where they appear. No spill code is generated when an
element of the 'regs' field is spilled, but the register is moved to
the 'spilled' (which is present, more for debugging than anything else).
LIVE replaces the (now deprecated) DEFFREG instruction on the alpha.
We used to generate:
DEFFREG f1
f1 := f2 + f3
trapb
but now generate:
f1 := f2 + f3
trapb
LIVE {regs=[f1,f2,f3], spilled=[]}
Furthermore, the DEFFREG (hack) required that all floating point instruction
use all registers mentioned in the instruction. Therefore f1 := f2 + f3,
defines f1 and uses [f1,f2,f3]! This hack is no longer required resulting
in a cleaner alpha implementation. (Hopefully, intel will not get rid of
this architecture).
COPYXXX is intended to replace the parallel COPY and FCOPY available on
all the architectures. This will result in further simplification of the
register allocator that must be aware of them for coalescing purposes, and
will also simplify certain aspects of the machine description that provides
callbacks related to parallel copies.
ANNOTATION should be obvious, and now INSTR represents the honest to God
machine instruction set!
The <arch>/instructions/<arch>Instr.sml files define certain utility
functions for making porting easier -- essentially converting upper case
to lower case. All machine instructions (of type instr) are in upper case,
and the lower case form generates an MLRISC instruction. For example on
the alpha we have:
datatype instr =
LDA of {r:cell, b:cell, d:operand}
| ...
val lda : {r:cell, b:cell, d:operand} -> instruction
...
where lda is just (INSTR o LDA), etc.

Implemented a complete redesign of MLRISC pseudo-ops. Now there
ought to never be any question of incompatabilities with
pseudo-op syntax expected by host assemblers.
For now, only modules supporting GAS syntax are implemented
but more should follow, such as MASM, and vendor assembler
syntax, e.g. IBM as, Sun as, etc.

A CVS update record!
Changed type cell from int to datatype, and numerous other changes.
Affect every client of MLRISC. Lal says this can be bootstrapped on all
machines. See smlnj/HISTORY for details.
Tag: leunga-20001207-cell-monster-hack

Slight cleanup on the Alpha.
Added a bunch of instructions to the x86 instruction set.
The module ra-rewrite-with-renaming has been improved.
These should have no effect on SML/NJ.
CVS tag=leunga-20000515-alpha-x86-ra

More assembly output problems involving the indexed addressing mode
on the x86 have been found and corrected. Thanks to Fermin Reig for the
fix.
The interface and implementation of the register allocator have been changed
slightly to accommodate the possibility to skip the register allocation
phases completely and go directly to memory allocation. This is needed
for C-- use.
This fix only affects the x86 assembly output.