dave@yost.com (Dave Yost) writes:>[Is it time to make compilation as easy as a subroutine call?]

David Keppel responds:

> Various languages have offered related facilities for years. Without> going in to a lot of detail (or being very precise :-), LISP> [McCarthy78] data and code use the same structure, so programs can> generate code on the fly; LC^2 [Mitchell70] allowed programs to> generate strings and then call routines to convert them to executable> code;

> SNOBOL-4 [GPP71] allowed programs to execute strings; there is no> command interpreter in Smalltalk-80 [GR83], instead all input> expressions are compiled and then executed. Most extensible systems> either use some kind of dynamic compilation or translate extension code> in to a form that could be compiled but is instead interpreted [NG87].

I am not sure of the history of actual compilation in LISP. However our
1968 implementation of POP-2 (Package for On-line Programming) supported
the incremental generation of actual machine code blocks (admittedly a
limited set of instructions). These blocks were, and had to be,
garbage-collectable, because that was required by the design of the
Multipop timesharing system. The subroutine call was -POPVAL- (shades of
EVAL of course,...), and took a linear (and tail-lazy) list of tokens,
which could be derived by applying INCHARITEM, which was the tokeniser, to
a character repeater derived from a string.

Subseqent work at Edinburgh and Sussex Universities on incremental
compilation includes:

** The development (principally by David Warren) of the first Prolog
compiler. This addressed various issues of compiling pattern-matching and
backtracking. The ways in which the operation of procedure activation are
taken apart and recombined by the Prolog people are very interesting, and
not without wider relevance.

** The development of the Poplog system by Sussex University. This
supports code-generation via a suite of procedure calls for operating on a
stack-machine, which is then transformed into moderately efficient machine
code for the target machine (efficient at least for integer arithmetic and
pointer manipulation). It has been used to implement Common Lisp, Prolog
and SML, as well as POP-11, which is the current development along the
POP-2 lines.

On the subject of efficiency, it is not obvious that non-garbage
collectable languages will eventually win out even on efficiency. With
modern architectures, the additional freedom that a garbage collector has
to relocate objects can improve cache locality in a way that more than
compensates for the increasingly residual costs of fully-automatic storage
management. Simon Peyton-Jones' STG machine, conceived of as an approach
to compiling non-strict languages, is impressive in the way it surmounts
the traditional "boxing" problems of garbage-collected systems.

>I think it would be great if there were shared libraries that>offered incremental compilation services. You would call a>service function and pass it a code object, which includes slots>for:> * Source code in one or more representations, e.g.> * some visual representation
My Pantechnicon system is aimed at this.> * source C> * preprocessed C> * assembler

The only trouble I have with unconstrained C in this kind of environment
is that it is particularly difficult to make it both safe and efficient.

The Poplog abstract representation for generating machine code is thus the
Poplog Virtual Machine (PVM), which consists of the suite of code-planting
procedures. E.g. sysPUSH(2); sysPOP("x"); generates code for pushing the
integer -2- on the stack, and then popping it to the variable -x-. This
will in fact be optimised to a move in machine-code. Other people have
used, e.g. SCHEME as an intermediate language.

Tools for performing the transformation into the PVM were initially absent
in POPLOG, except in-so-far as POP-11 provided a tolerable envrironment
for programming them. But the coming release has a yacc-style
parser-generator incorporated.

Typically, however, one needs an abstract form that is intermediate
between source text and the PVM, as a basis of type-checking/inference
strictness and update analysis etc. Independently of Sussex, I have also
developed a parser-generator that supports some of the above attributes,
and allows one to define the intermediate form painlessly. Error messages
for syntactic errors are actually quite helpful - the edit cursor is
placed at the error, and a token that could have given rise to a legal
parse is printed out. However support for integration with the POPLOG
source-reference debugger is not currently provided.

>Some services might include:

> * translate source from language syntax xyz into AR.> This service, traditionally called the "front end" would be> left for the implementor of the source language to provide,> though translators for ISO C would proabably be widely available.

My parser-generator uses productions such as:

<while_stmt> -> while <expr> do <statement> 'while(^t1,^t2)';

to accomplish this. Here the '...' sequence, by default, creates abstract
syntax using the "mk_term" function to build a tree. If desired a
conventional semantic function can be called instead.

> * translate AR into machine code for one or more target CPU architectures
This is of course what the POPLOG VM does. But only for the machine on
which it is running. Incremental cross-compilation is not supported, except
by running POPLOG on the target machine.

> * link two or more code objects together for common execution

Poplog supports limited incremental linking of external code. Ordinary
code objects are "linked" via the name spaces of the various languages.
Much work has been done on being able to maintain consistency between
Poplog representation of data-objects and that of external languages.

> * disassemble AR into assembly source code

The parser generator ought to be able to do this (or tell you it can't be
done unambiguously ... you might for example generate the same AR for for
and while loops. Mine can't, but I have thought about the problems. With
a little annotation about indentation conventions it should be possible
automatically to generate some equivalent source. This assumes that the AR
building functions are, in functional language terms, -constructors-. For
example:

> * disassemble machine code into AR
Some Prolog implementations have been known to do this...> * control execution of the machine code (set breakpoints, step, etc.)

>A traditional batch compiler could be implemented as a shell>that sends source text to some services from the shared library,>then dumps out asm and/or object files.

>Compilation could become more ubiquitous and more interactive.>Programming environments supporting incremental compilation could>flourish--they could concentrate on making great tools for>humans, starting at a much higher level.

Well, yes...have the SmallTalk and Lisp communities particularly not been
doing this? On the Poplog front, the latest development is HIP, Hypermedia
In Poplog, which uses Pop-11 as the scripting language.

>C is being used as a portable assembler today by compilers for>such languages as Eiffel, Modula-3, C++, Scheme, etc. If we had>an extensible, standardized object definition such as what I>propose, with services made available in shared libraries,>implementers of new languages could break free of the batch>nature of existing C tools, and they could easily implement>dynamic programming environments on top of existing compilation>and debugging tools.

Primarily I see the objection being C programs under development have a
finite lifetime before they break. So a language should be at least
dynamically type-safe (e.g. LISP or POP-11) and preferably statically
type-safe too.

For an ASCII-based formalism, a -restricted- C would be quite attractive.
Personally, I like the general flavour of C syntax. But a language should be
designed for type-safety, which C is not.

Werner Assmann writes:

> 1. You has to use always the same environment for updating your source text.> This seems to be a simple thing but restricts your freedom. It's more> difficult to reach consistency. Note: there are a lot of people in love> with 'vi'.

What is needed here is protocols whereby an editor communicates -incremental
changes- to a compiler, which then performs an -incremental parse-.

> 2. The implementation of such systems is not so simple as people thinks.> Especially in the case of any errors it's sometimes a little difficult> to avoid inconsistent states.

How difficult depends on the language. With "flat" and dynamically
typed languages like LISP, POP-2 and Prolog, problems were minimal.
With a statically typed language, a change in a type declaration can propagate
to every piece of code, making incremental recompilation impossible for
certain changes. One solution to consistency problems is to write
functionally, with memoisation providing the incremental properties.

The SML compilers are a compromise which is not too happy, in my opinion.
Consider the sequence:

fun fred x = 34;
fun joe x = fred x + 45;
fun fred x = 45;

Now in POP-11 or LISP, the redefinition of -fred- would be reflected in the
meaning of -joe-. But not in SML, which takes the no-doubt-rigorous view that
the meaning of -fred- in -joe- is what it was when joe was compiled.

> 3. The most interesting thing in incremental compilers is the high> compilation speed to avoid waiting time (== lost time). Some people say:> the current hardware is so very fast that all such tricks are not necessary.> I'm not convinced by this argument because I'm dayly waiting a lot of> minutes for the results of compilation processes.

Well, I don't wait minutes except in some applications developing X-windows
and other applications that involve linking in lots of C to POPLOG.
But of course run-time performance is not always adequate with Poplog's
incremental compiler. Which means I do spend some time hacking C. But it is
usually only a small core of an application.