The P4 compiler

The P4 compiler is the second of the set of portable compilers that originated
in ETH, the other being Pascal-S.

P4 is a series of compilers known as Pascal-P. The versions of the compiler
were:

Pascal-P1 1973

Pascal-P2 1974

Pascal-P3 1976

Pascal-P4 1976

There were no futher versions produced in Zurich beyond P4. You will find a full overview
of the Pascal-P systems in PUG newsletter #4, page 81.

Whereas Pascal-S was designed to load, compile and interpret Pascal
programs, P4 was the same idea implemented in separate programs, one which
compiled and the other which interpreted. This was possible because an
ideal machine was created, and the first pass output assembly code for
that. The second pass then assembled the code into memory and interpreted
it.

P4 is often called the first "bytecode" virtual machine, but this
is not correct. P4's instruction stream was not organized into bytes, as is
typical of the JVM (Java Virtual Machine).

The two advances for P4 vs Pascal-S were that a larger portion of the
complete Pascal language was implemented, and that defining an intermediate
language and parser allowed the back end (the interpreter) to be replaced
by a true code generator, and thus achieve a true compiler. The P4 compiler,
then, was designed to get Pascal up and running on machines other than
the CDC 6400 with least effort.

The components of the original "P4 porting kit" were:

The source for the compiler and the interpreter (pcom.pas and pint.pas).

The "assembly language" source for the compiler (as translated by the compiler
itself).

Wirth had several methods in mind to get P4 running on a new architecture:

Create a new assembler/interpreter using the assembly language for the
target processor, or another language.

Use a macro assembler to implement the intermediate.

Hand translate the intermediate

If you understand N. Wirth, you would also understand why he did not consider
the last option to be amazingly painful. The traditional method to port
a new language to an unfamiliar computer is to create a compiler on another
computer in that language that targets the new computer, then have the
compiler compile itself to the new computer, and move the tape to the new
computer to run it.

If Wirth had any takers on his novel porting method, I would like to
hear about it. Actually, there were probably a few university projects
that used the method.

P4 was not the only means used to port early Pascal compilers based on Niklaus
Wirth's work. The other method was to modify the CDC 6000 compiler to generate
code for another machine, then bootstrap the compiler to the new system.

Compiling and using P4

The P4 set can be compiled with any ISO standard compiler. P4 itself compiles
a subset of standard Pascal, with the following omissions/changes:

Procedure/function parameters.

Interprocedural gotos (goto must terminate in the same procedure/function).

Only files of type "text" can be used, and then only the ones
that are predefined by P4, which are "input", "output",
and two special files defined so that P4 can compile itself.

"mark" and "release" instead of "dispose".

Curly bracket comments {} are not implemented.

The predeclared identifiers maxint, text, round, page, dispose, and the functions they represent, are not present.

The procedures reset, rewrite, pack and unpack are not implemented
(they are recognized as valid predefined procedures, but give an 'unimplemented' error
on use).

Undiscriminated variant records.

Output of boolean types.

Output of reals in "fixed" format.

Set constructors using subranges ('0'..'9').

"mark" and "release" are dummy functions
in the compiler, since they have no meaning on a ISO standard compiler.
What this means is that dynamic space, once allocated, is not freed. This
is not a big problem for the kinds of small programs you would typically
run with P4.

P4 also has some interesting quirks. "array [1:10] of char" is
a valid declaration in P4, and the '..' and ':' tolkens are aliases of each
other. The reason is probally lost in history.

The limitations of P4 vs. the full language were deliberate. The idea was
to remove any language detail that the P4 itself didn't need, so that it could
self compile. Remember that P4 was primarily designed to be a bootstrapper for
the language. Unfortunately, some of the limits of P4 persisted into actual
implementations of the language based on the P-system, which is a good lesson
for language designers: don't implement subsets of your language if you don't
want to see that as permanent somewhere.

I placed the files used by the compiler into the headers of the programs.
In many Pascals, that allows you to associate a name with the file. If
you have a compiler that does not, simply use another method to assign
names to these files.

P4 would be a very limited compiler to use on a day to day basis. It
does however, have use as:

A toy compiler, to see how compilers work.

As a starting basis for your own compiler.

A historical item

As an example of a real compiler for Pascal, I would recommend also Per
Brinch Hansen's book on compilers.

Note that the PUG newsletter #11, page 70 has a collection of bugs or
limitations and their
solution for P4.

Note that the error numbers given by the P4 compiler were listed in
the "Pascal User Manual and Report" [Jensen and Wirth] second edition on
page119. This information was removed in later editions of the book. I recommend
serious users of P4 get an old copy of the book: Oddly, it is still available
new (a new version of the old second edition): Also note (as Steve Pemberton
states in his book) that error '399' changed meaning from 'variable dimension
arrays not implemented' to simply 'unimplemented', and is used for several unimplemented
features in the compiler.

Note that the compiler contained in the book "A Model Implementation
of Standard Pascal" [Jim Welsh and Atholl Hay] is a P-machine that implements
a full ISO 7185 Compiler/Interpreter. This probally qualifies as an implementation of the theoretical "P5" compiler. You can find that
book here:

The "Model Implementation" isn't just a modified P4 compiler, it
is extensively parameterized, commented, and has a high degree of portability.

P4 or P5?

P4 was changed slightly to compile under ISO 7185, but itself
only compiles a subset of the full Pascal language. If you want to use P4 as
the basis of a serious compiler, I recommend you start with the P5 project:

In addition, the goal of P4 adaption here was to perform minimal
adaption to allow it to run on currently available ISO 7185 Pascal compilers.
There were many bug fixes in P4 that were made in P5, but not corrected in P4.

Validation and checkout of P4

In the form that P4 was obtained from Steve Pemberton's site, P4 didn't run
correctly on my 80386/Windows installation. On 2007/11/14, I finished a series
of modifications and testing that resulted in a fully working and checked out
compiler. The links on this page have been updated accordingly.

The method used to check the P4 compiler was the same as for Pascal-S, a
"cut down" version of my ISO 7185 test suite as detailed here:

What was removed from the test were the language features that were not implemented
in P4 (remember P4 is designed to implement a subset of Pascal, not full Pascal).

Changes required

The changes needed for P4 to compile and work under Windows are detailed
in the source. The biggest change comes from the nature of its CDC 6000 dependencies.
P4 assumes that integers, characters and booleans are interchangable with respect
to the space they occupy, which is a 60 bit CDC 6000 word. On a 80386, or indeed
the vast majority of processors being manufactured today, use "byte addressability",
meaning they can address objects as small as a byte. The compiler used to compile
P4 represents characters and booleans as bytes, which means that if you start
interpreting a character as an integer, you will see the extra bits over the
8 bits in a character as garbage. This comes about in P4 because, for example,
it treats "ord" as a no-op, and expects an undiscriminated variant
record change from character to integer to work, as it would on a CDC 6000.
P4 does include the ability to treat each type of data differently, it was just
not implemented in the P4 compiler as it was.

Self compilation

When I finished checking and testing P4, I wanted to have it compile itself.
Although I added the changes required to make P4 ISO 7185 compilant, these changes
could be commented out so that it could compile itself. However, P4 was not
quite able to compile itself, due to several reasons:

P4 passes parameters of type "text", for example the routine
readi(var f: text); P4 cannot declare any file, it must use the default
header files.

The CDC 6000 routine "halt" is used (exit program immediately).
P4 does not implement that.

P4 uses a jump to the program end (an alternative to "halt").
P4 cannot compile such interprocedure jumps.

There are also a series of more obscure factors. For example, P4, as listed
both here and in Steve Pemberton's book, has a serious bug that prevents it
from actually reading from the prd file correctly (the prd is the input file
used to pass its "assembly" language). I won't spoil the fun of finding
it yourself. There are more such bugs discussed in the PUG newsletters. Lets
just say that it is clear that the p4 machine, as given to me from Steve's site,
has clearly not been used to compile itself for some time.

All this means is that you would have to modify P4 to get it to compile itself.
This would be non-trivial, especially for the "text" declarations,
so I decided that I would indeed modify P4 to compile itself, but the result
was best called "P5", instead of P4. In other words, creating a self
compiling version of the compiler is a much easier prospect if I remove the
idea of trying to maintain it in as close to its "historical" condition
as possible.

In the meantime, P4 has reached a high degree of workability in the present
version here, and has passed several large tests.

Space efficiency

P4 takes more space than it needs to on a current machine. The reason is that
the "store" array, where all data for the running program is kept, has a single
record that covers all of the integer, real, character, boolean and set formats.
On the CDC 6000 each of these was indeed the same size, a 60 bit machine word.
A set of 60 elements would suffice for that machine, because the CDC 60000 used
a special character set with only upper case. Thus, "set of char" was still
possible.

Even the CDC 6000, however, would be wasteful with characters, since they weren't
represented as packed. Each character of a constant string would take 60 bits.
To be fair, this is also true of booleans, but booleans are not commonly
represented as arrays. The waste of space with string constants is handled in P4 by setting
the total string length limit fairly small. It was 16 characters in the original
P4 source. Also, P4 avoids the use of string constants whenever possible. Error
messages are numeric, and printing of string constants is kept to a minimum.

On a typical microprocessor today, a set capable of representing "set of char"
in ASCII is a minimum of 128 bits or 16 bytes, and probally 256 bits or 32 bytes,
which means the 8th bit of the character does not have to be dealt with
specially. This means that each location of "store" would have to take that much
room. A character string would take the number of characters times 32 bytes,
and you can see that the space requirements mount up rapidly.

With gigibyte ram stores common, I was able to accommodate P4 by simply turning
up the constant values until it was able to accommodate my large test programs.
However, the space requirements of of P4 are something that needs to be addressed
in a P5 version of the compiler.

Source

pcom.pas. The compiler program. This is my version
that I have modified to make it more standard.

pint.pas. The interpreter program. This is my
version that I have modified to make it more standard.

Will compile and run the program "program", which is specified
without an extention. Note that you need to hit return when the program starts
to produce any output. This is part of the famous problem with older Pascals
that they needed to input before they can print anything (which was solved with
"lazy I/O").

If you want to run the individual programs:

pcom output.p4 < input.pas

The input file should be a file such as "hello.pas", with the extention
specified. The output should be the intermediate assembly file, like "hello.p4".
This is where the assembly code for the virtual machine is placed, and can be
displayed in a standard ASCII editor.

pint input.p4 output.txt

The input file should be the intermediate assembly file from pcom, like "hello.p4".
The output is where you want output from the program to go. This is used by
P4 when self compiling. This is where the output P4 machine assembly code goes.