"Randall Hyde" <rhyde@cs.ucr.edu> wrote in message news:02-03-127...> >There were two major problems which were unresolved which blocked> >that line of development [of C-AS]: (1) a comprehensive scheme for> >properly handling the weirdness associated with the way assemblers> >allow for references to yet-to-be-defined addresses but yet allows them> >to be used in assembler directives and expressions (a VERY nasty> >recursion issue lurks beneath this),>> This problem doesn't get resolved via multi-phase assembly? Maybe I> don't understand the exact nature of the problem you are describing.

In the following, I'll describe the most general situation and then
outline an actual definition for a universal code generator.

In the most general situation, the tool in question will generate a
system of equations

The general problem of finding a solution (merely FINDING a solution!,
not finding an 'optimal' solution) is equivalent to that of solving
systems of equations over a boolean algegra since the expressions
above are in a finitary arithmetic which, itself, can always be
embodied within a boolean algebra.

Equation solving over boolean algebra is the standard archetype of
NP-complete problems.

Different technologies will handle this situation in different ways
ranging from:

(A) Forbidding any cyclic references in the system of
equations.

This is too restrictive for most assembly languages, where instruction
mapping (e.g. for multi-size instructions like "jmp") necessarily
involves cyclic references in the corresponding equation list

(B) Restricting cyclic references to such a form that
the corresponding equations can be written in the
form X1 = Z1; X2 = Z2; ...; Xn = Zn; where the
expressions Z1, Z2, ..., Zn are expressions over a
more restricted finitary algebra.

The usual restriction is to permit only sums of the
form X + X + ... + X + Number for the Z's. The equations
are then simple linear equations which can be readily solved
(or readily proven to have no solution).

The question of what restrictions apply is inessential. The
general tool gives you an environment that looks like this
(and this is the kind of tool what CAS was meant to evolve into):

This is universal and AS has absolute no knowledge of any
application it might be being used for (i.e., AS doesn't
even know it's an assembler).

LK: O,O,...,O -> O -- resolve & fold multiple O files.

LK is a universal linker. It too doesn't even know it's
a linker or what application it's being used for.

Mutual references between different O's can be used to
perform "equation folding".

In general, the O file contains a system of symbolic equations, as
well as a symbolic representation of the mapping & reservation
directives. When multiple O files are merged, the free variables of
one O file may become bound to expressions via definitions in another
O file.

Equation folding removes all non-cyclic references of expressions from
the right hand sides of the equations. The result is a system of
recursive equations.

All assemblers contain this structure implicitly, though it may not
actually be fully manifest but rather concatenated with the rest of
the structure described below.

The non-universal elements corresponding to the environment where this
tool is being used convert O files to either OBJ files or EXE files.
There are actually 2 kinds of environments, each requiring a different
kind of tool.

Hosted link environment:
X: O -> OBJ

The "X" tool is target-dependent. This is the place where any
restrictions on the forms of allowable equations are enforced.

In general, every native object file format is, itself, an implicit
account of a system of symbolic equations interspersed with the
reservation/mapping primitives. So, what X is doing is stepping down
the generality of the allowable systems contained in O files to those
allowed in OBJ files.

You can even have different X's for the same environment, depending on
how much equation-solving power you want to put into X, itself. So,
greater or smaller ranges of expressions Z1,Z2,...,Zn may be allowed
in the O file.

The "Y" tool is also target-dependent. This functions almost exactly
like the X tool, except for the fact that the general EXE file format
is much more restrictive as to what kinds of unresolved references it
allows for.

This is also the place where you see the "unresolved reference"
errors. The O file which gets converted to an EXE generally has no
free variables (other than loader-defined symbols).

--------------------------

So... what does the SRC language look like in general? Again, this is
the most general situation that you see in assemblers, and every
actual tool out there will embody some kind of restriction of this.

Generally, everything is a directive built out of the mapping
primitive: db A,B,...,Z; and reservation primitive ds A; and the
object $, which represents current location.

Statements are built out of directives in the following
ways with the following syntax:

An example of the latter: if "code" denotes an address type,
then code 0x20 would be address 0x20 in the corresponding
address space.

The label definition "X T A" is just shorthand for "X equ T A".

Symbolic operator symbols can be defined with the same syntax
as in Prolog, and arities and precedences likewise; e.g.,

xf ":", 200

defines ":" as a (non-associative) postfix operator with
precedence 200,

yxf "+", 800

would define "+" as a left-associative infix operator with
precedence 800. A standard configuration would exist for
the full range of C-like prefix, infix and postfix operators.

A macro operation/mnemonic is defined by a macro, whose definitions
take on the general form:

define Op(Ex,Ex,...,Ex) St

where the free variables in the corresponding expressions are
matched upon macro call by pattern-matching.

A useful expediency, especially for this context, is the idea
of "anonymous labels", which for example could be represented
as

number:

with references made by "numberF" or "numberB" where the former
refers to the next occurring number: label, and latter to the
previous occurring number: label. This is something that's
actually incorporated in the C-AS assembler.

The AS tool generally would have no knowledge that it's even
an assembler. That means the actual address types are,
themselves defined too.

I won't specify the syntax for address type definitions in
details, but try to illustrate how it might look like for
the two most immediate examples: the 8051 and 8086.

An 8051 configuration might look like this:

An "i" space is one where the mapping primitive <- can be used.
The reservation primitive <<- can be used with all spaces.