This paper was written while the author was with YOURDON Inc.
This document has been made available by courtesy of the author.

It has been re-typed by
Phil Budne,
who is to blame for any transcription errors.

A Little Implementation Language

by
P.J. Plauger

ABSTRACT

A language is described that was implemented on a PDP-11 computer for
writing system-level code for the PDP-11 family of minicomputers. The
Little Implementation Language LIL offers a number of features that
facilitate writing structured, high-level code with no
sacrifice in efficiency over assembly language. The discussion
ends with a harsh evaluation of its future usefulness.

INTRODUCTION

There is a broad gap between the
fanciest macro assemblers and the simplest system implementation
languages (SILs), a gap that needs filling. No matter now elaborate
an assembler is, it is still just an assembler, with all the
limitations on productivity attendant on writing code at too detailed
(an unstructured) a level. SILs on the other hand, seem to impose a
layer of insulation between programmer and machine that causes more
trouble than benefit in an environment where program size and
execution time are very important, as in mini- and microcomputers.
What is desired, then, is a language that lets one talk about the
matters that concern systems level programmers, but at a level that
permits economic expression and encourages the well structured coding
style we now know to be important.

It is no surprise that such
languages are not common. People who think in terms of high-level
languages have traditionally been problem-oriented; people who write
systems have been reluctant to yield control of code generation to a
language processor. Is is only recently that, thanks to the success
of some of the some of the earlier SILs, the use of compilers has
earned some grudging respect from systems writers. The cobbler's
children are always the last to get shoes.

This paper describes a compiler,
for the PDP-11 family of computers, that lies in this region. It can
be considered a narrative assembler, in the sense that the
users knows rather precisely that code is generated, but can specify
it with an economy of notation. It can also be considered a little
implementation language, because it deals with multi-line, nested
control structures rather than, at a time code, and because it uses
data type attributes to help direct the generation of instructions.

The LIL language was defined in
Backus-Naur Form, which was used by a compiler-compiler to produce a
parser. The parser was combined with semantic routines written in the
SIL called C (developed by Dennis Ritchie) running under the
UNIX operating system. Significant portions of
the LIL compiler were subsequently rewritten in LIL to prove its
effectiveness, and the resulting language was used to help write a
small PDP-11/10 operating system at Bell Labs. The compiler is
one-pass and produces object code that may be freely intermixed with
that of C or the UNIX assembler.

In its final form LIL probably
most closely resembles Wirth's PL/360, in that it
is a structured, machine dependent language with data types. Its
control flow syntax, however, is modeled closely after that of C, for
the simple reason that C was successful and there was no good excuse
for introducing a new notation. A number of other ideas were carried
over from an earlier (unstructured) language written by A. G. Fraser
and P. D. Jensen for the GTE Tempo I.

SALIENT FEATURES

What makes LIL unique, and
uniquely suited for minicomputer programming, is the way its various
design features work together to do what is wanted without also doing
many things unwanted. But to understand the language as a whole, it
is necessary first to understand its parts. Here are it's principal
concepts, taken separately;

The data types of the
language correspond rather directly to the addressing modes of
the PDP-11. (For another machine, a different set of modes is
defined.) This there are the bases types
register (R),
memory reference (M), and
immediate (I), and the derived types
indexed (I[R] or [M]R),
autoincrement ([R]++) and
autodecrement (--[R]).
The latter two modes support pushdown stacks and array walking in the
PDP-11. Any of the above can further be made
indirect ([I[R]], [M], [[R]], etc.),
or of type byte.

generates an instruction that adds X to R3. This need not be an
add instruction; If X is known at compile time to have
the value +1, an inc is generated; if X is -1, a
dec is used. Similarly,

R3=X;

generates a clr if X is zero, rather than a
mov. Such decisions ensure that the best instruction is
used whenever there is a choice, thereby encouraging the use of
parameters in place of flags with ``magic'' values like zero and one.
It means that the user needs to know only a handful of operators
instead of a page full of opcodes, a service that would be even more
valuable on more irregular machines such as the IBM 360.

The compiler will not, however,
accept an arbitrary combination of operands s with an operator. While
it is not a rigorous rule, it is generally true that at most one
instruction will ever be generated for each infix operator. That way
the users is assured that his code will not puff up unmanageably in
translation, so he can remain aware of the limitations of the machine
and still have a uniform notation for describing what is
possible. (The ``at most'' part of the rule leaves the door open for
selective optimization).

An interesting rule is applied to
multiple operator expressions, One cannot say

R3:=X+Y+Z;

since that implies that the right hand side is computed someplace
else, then moved into R3. At the lowest level there is no
``someplace else.'' So all operators including the move
operators = and ->, are given equal weight; an expression is
evaluated strictly left to right using the leftmost operator
repeatedly. Thus

R3=X+Y+Z;

is the same as

R3=X;R3+Y;R3+Z;

and long sequences of code invoking a single accumulation can be
written naturally on one line

R0=X+Y[R1=i**1] + (R4=Z) -> W

Note the use of subexpressions inside brackets and parentheses to
alter the left to right flow long enough to set up other operands.
This line is equivalent to

R0-X;R1=i;R1**1; # shift R1 left one
R0+Y[R1];R4=Z;
R0+RY;R0->W # shift same as W=R0

Conditional expressions determine the conditional branches to
be generated in control flow statements. These can be
relational

if (R0<0)R2=-R2; # negate R2 if<0

or present conditions

if (carry){msp+1;carries+1;}

or expressions

if (i<MAXI && control>0) control+1;

The conditional operators && and || bind in the usual manner
but guarantee strict left to right testing with no unnecessary (or
unwanted) evaluation. Thus

while(0<-R0&&R0<=MAXI&&A[R0]~= target)
R0+1;

will never reference A[R0] if R0 is out of range.

The control flow statements each control a single statement, but as
can be seen above that ``statement'' may be an arbitrary reference in
braces. if...else is provided as well as
loops of all flavors (tests at top, bottom, or in the middle). There
is a goto, but labels are introduced only in conjunction
with groups;

linecount{0;}
doread{R0=READCODE;SYSREAD:RTS:}

This encourages one to think through the scope of code headed by
a label, whether it be a data area, a subroutine, or even a block of
code jumped to. Note, by the way, how constants are merely specified
in line, so that special instructions need no special treatment.

One final feature worth
particular attention is the compile time expression. Enclosing
an expression in double quotes instructs the compiler to do the
arithmetic right away instead of generating run-time code. Thus

R0="A-2"[R1];

generates

mov A-2(R1),R0

and not a subtract instruction in addition to the move.
Conditional expressions can even be used at compile time to
conditionally generate code:

if ("MACHINE<35")
R0**1**1**1;
else
R0**3;

Only one of the two lines is actually compiled, and no branches are
generated.

OPTIMIZATION

LIL makes a limited attempt to
optimize the code produced, principally in conjunction with tests and
branches, since that is the only code that it tries seriously to
control. It will pick the best of a choice of instructions or
addressing modes, but it would never change the three single shifts in
the last example into one triple shift. It assumes in general that
the user knows what he is doing and does not want to be protected from
himself.

Some optimization is performed just to encourage
more uniform notation and to avoid the need for extra operators. So

if (a>0)

must generate a tst but

if (a+b>0)

need not, since the condition code is known to be set by the
immediately preceding add. And, by the way,

if (>)

is legal and assumes the condition code is properly set.

Conditional branches in the
PDP-11 can only be specified to a 512-word region around the current
location; beyond that a conditional jump (of the opposite sense) over
a two word jump must be used. LIL maintains a 256-word ``delay line''
for the emitted code to detect as many short branches as possible.
(It does not catch them all.) Beyond that, it strives to produce the
most economical set of branches for a given test and eliminates most
of the branches to branches that occur in structured programming.
Often the code produced is shorter than a hand coder would dare
produce, because of those savings.

Optimization, then is basically
of the ``peephole'' type only. This seems to be all that is
necessary or desirable at the systems level.

EVALUATION

The use of LIL makes for very
readable code that is much easier to write, maintain and modify than
assembly language. Despite its considerable power in some dimensions
it is indeed a little language -- the conventions can be
summarized in a few coherent rules and the compiler itself comprises
less than a thousand lines of code. More important, it requires
no runtime support and permits the user to write code at a high
level with no
sacrifice in space or speed. For many minicomputer and microcomputer
software development projects it appears to be well suited.

LIL is, however, a failure.

Its stiffest competition at Bell
Labs is the language C, which is higher level, and machine
independent. Every time it looked like C was too expensive to use for
a particular project, LIL was considered. But almost every time, it
proved easier (and more rewarding) to improve C, or its runtime
support, or the hardware, than to invest time in yet another language.

LIL is superior to assembly code,
to be sure, but there are few remaining crannies where C has not
penetrated. Since C compiles to assembly language, why not just use
the assembler to handle the few hundred lines of intractable code? It
has to be around anyway.

Why not indeed? A machine
independent language is always superior -- even for writing machine
dependent code (it's easier to find trained programmers) -- so long as
the overhead can be endured. Its is clear now that writing
straightforward code and then measuring it is the formula for the
best end product. At worst there will be 5-15 per cent overhead,
which is seldom critical. Once system writers become mature enough to
recognize this basic truth, they gravitate naturally toward machine
independent SILs.

If there is a role at all for
languages like LIL, it will be in microcomputers and very small
minicomputer configurations. Not so much because the need is more
legitimate but because of the desirability of a sil such as C will be
more slowly recognized. Otherwise it looks like the little
implementation language is an idea whose time as come -- and gone.