The P5 compiler

The P5 compiler has existed for a long time as an idea. P4, the last of the
Zurich series P-system compilers, left off before the ISO 7185 standard in 1982.
It was not only not standard Pascal compliant, it also was only a subset, abeit
a substantial one, of full revised Pascal. As an example, or "model"
implementation of Pascal, it would have made sense to update the compiler to
ISO 7185 status, and that was basically done as "A Model Implementation
of Standard Pascal" [Welsh&Hay] before 1986. In fact, the project was
designed to support the ISO 7185 project.

However, there were a few reasons that a true P5, a straightforward update
of the old p4, was a good idea. First, the source code of the "Model Implementation"
is not generally available. Second, the "Model Implementation" is
a complete scratch rewrite of the compiler, and shares virtually nothing in
common with the original P4. This was important because several books, articles
and online resources exist for the P4 compiler

What I wanted for p5 was a compiler that both accepted ISO 7185 standard
Pascal, and was also written in it. The compiler is an extended version of P4
and uses the same intermediate codes where possible.

P5 now accepts the full ISO 7185 language, and also has been remade as a
byte oriented machine, similar to what was done for both the UCSD compiler and
the "Model Implementation". This is is the key to achieving a high
efficiency implementation that runs with compact code.

P5 also runs the PAT or Pascal Acceptance test, and also self compiles.

P5 correctly runs the BSI, or British Standards Institute tests.

The meaning of P5

P5 is a very important milestone for Pascal. To understand why, it is a good
idea to review why P4 was important. P4 was to accomplish:

Gave an example compiler for the Pascal language.

Gave a "model" of Pascal more complete than any description
(i.e., the effect of any program construct could ultimately be determined
by running it on P4).

Provided a bootstrapping kit to create new Pascal compilers.

To understand why P5 is important, you must understand that P4 didn't completely
accomplish the above goals. First of all, P4 was a subset of the full languge.
It was never designed to run the full language, only a minimal subset that could
be ported to a new machine. The idea was to finish out the full language on
the target machine.

Unfortunately, that meant there was not a concrete model of some of the more
advanced (and hard to implement) features of full Pascal, for example interprocedural
gotos, and procedure and function parameters. These and other features of Pascal
left out of P4 were often left out of target compilers, and when they were implemented,
they were implemented wrong.

The other issue is that P4 was designed to be a minimal bootstrap implementation.
If you examine P4, you will see that it makes little use of strings, and keeps
them short. This is because it is very inefficient when it comes to storing
them in memory. They are stored one character to a word (60 bits on the CDC
6000). Pascal has packing, but that is not implemented in P4.

Finally, P4 is very much oriented to the CDC 6000 that it originated on.
Everything is stored in 60 bit words, and there is a packing system designed
to store two instructions per word.

The reason P4 had these limitations is that memory was very limited back
in the 1970s, when P4 came about. Even on the CDC 6000. The authors of P4 worked
hard to get P4 down in size and memory requirements so that it would self compile.

By the time of the ISO 7185 standard, many people understood that P4 was
limited for its purposes. The "Model Implementation of Standard Pascal"
[Welsh & Hay] was the answer, and it contained a compiler for the full ISO
7185 Pascal language. Further, it implemented the interpreter as a byte oriented
machine (sometimes called a "bytecode machine"). Unfortunately, it
got sucked up into the BSI, who have effectively killed it (there appear to
be no internet copies of it, and the BSI has not been forthcoming concerning
it). Another issue with the "Model Implementation" (with apologies
to Jim Welsh and Atholl Hay), is that the MI is written in the "self documented"
form (avocated by D. E. Knuth and others) where the entire documentation exists
in the same file, intermixed with the code. This is a beautiful method to present
code as a finished product, but it tends to be fairly difficult to work on an
change. Finally, the MI was a complete break with P4, and had nothing in common
with it. This meant that MI used completely new methods and documentation, whereas
P4 was already documented in the common media and well understood.

P5 is both a break with the past and an embrace of it:

P5 is a straighforward extention of P4, and so most of the documentation
and methods used with P4 are applicable to P5.

P5 completely implements ISO 7185.

P5 serves as a complete model for the implementation of Pascal.

P5 can be used to bootstrap both new compilers, and can efficiently
self compile without limitations.

P5 is oriented toward byte machines, which is virtually all machines
available today.

P5 can be used as a working interpreter, useful for running real programs.

The PAT and PRT

The PAT or Pascal acceptance test is a series of tests in one file that go
through each feature of ISO 7185 Pascal. If a ISO 7185 Pascal implementation
can compile and run this correctly, then it is substantially compliant with
ISO 7185 Pascal.

There are two types of tests, the PAT and the PRT, or Pascal Rejection Test.
The PAT test should compile and run correctly, and is a "positive"
indication that the implementation compiles standard structures and gives standard
results. The PRT is the opposite. It is designed to either fail to compile or
generate runtime errors or both. It is a "negative" test that makes
sure that the implementation rejects non-standard structures.

The PAT only is represented here (for now).

Relationship to the BSI test suite

The BSI test suite [covered in Wichmann&Ciechanowicz] includes both positive
and negative testing, and appeared in original version in the Pascal
User's Group. After a great deal of trouble I was able to OCR a copy of
that test, which was published free and clear of restrictions.

However, the both the test suite and bore copyright notices at one point,
and both were given to the BSI (British Standards Institute) to keep and distribute.
The BSI no longer distributes either, at any price, and whether they have kept
it is also in doubt. In fact, recently I have been calling them about once a
month to find out any information about the pair of programs. Both of them were
created at universities outside of the BSI, and both were intended to be distributed,
not locked in a vault to be eventually discarded.

I don't and won't distrubute the BSI test without permission, and I don't
have access to the model compiler. Even with the BSI status of "openly
published, but rights kept", I don't feel comfortable putting it up on
this site. However, because it was in fact openly published, I don't feel that
I, as an individual, am unable to run the tests, either.

Now, the reason that all of this matters is that with P5, we have effectively
replaced the material imprisoned by the BSI. P5 upgrades P4, which never bore
a copyright, was public domain, and was distributed openly. I put my own work
into upgrading P4 into P5, but I donate that work back to the public domain.
As P4 was, P5 is free of copyright and charges. Use as you see fit.

The PAT was created entirely by me and is original work. However, I also
donate this to public domain. It was created back in the early 1990's, and used
to validate both mine and other Pascal compilers. The PAT effectively replaces
the positive testing side of the BSI. I also intend to create a negative test,
the PRT, and also make that public domain.

Further, the PAT and PRT form a collection point for tests, including test
that were made in reaction to the failures seen while running the BSI tests.
In other words, if the BSI test found a failure, then an equivalent test was
added to the PAT (not copied from the BSI!). This is a work in progress, so
not all failure points have yet been addressed.

Thus, the PAT and PRT are designed to be full replacements for the BSI tests.

Format and working of the PAT

The PAT is designed to execute a small amount of code, then print the results.
Each "test point" tests one feature of ISO 7185 Pascal,and is numbered
according to type and sequence. Here is an example from the test:

write('Control6:
');

if
true then write('yes') else write('no');

writeln('
s/b yes');

This prints:

Control 6: yes s/b yes

So you see the number and type of the test, control structures number 6,
the result, 'yes', and finally what the result should be.

The PAT is designed to be verified manually, that is, you read it and check
that the printed results equal the "should be" collumn. The PAT can
be easily automated for regression purposes by redirecting the output to a file,
then comparing a saved "gold" version of the result file to the current
file.

Self compile

P5 is capable compiling itself. This takes different steps for each of the
sections, pcom and pint. The resulting intermediate files are listed in the
files seciton below.

pcom

I was able to get pcom.pas to self compile. This means to compile and run
pcom.pas, then execute it in the simulator, pint. Then it is fed its own source,
and compiles itself into intermediate code. Then this is compared to the same
intermediate code for pcom as output by the regular compiler. Its a good self
check, and in fact found a few bugs.

The Windows batch file to control a self compile and check is:

cpcoms.bat

What does it mean to self compile? For pcom, not much. Since it does not
execute itself (pint does that), it is simply operating on the interpreter,
and happens to be compiling a copy of itself.

Changes required

Pcom won't directly compile itself the way it is written. The reason is that
the "prr" file, the predefined file it uses to represent it's output
file is only predefined to p5 itself. The rules for ISO 7185 are that each file
that is defined in the header which is not predefined, such as "input"
or "output", must be also declared in a var statement. This makes
sense, because if it is not a predefined file, the compiler must know what type
of file (or even non-file) that is being accessed externally to the program.

Because the requirements are different from a predefined special compiler
file to a file that is simply external, the source code must be different for
a P5 file vs. another compiler. A regular ISO 7185 Pascal compiler isn't going
to have a predefined prr file.

The change is actually quite small, and marked in the source. You simply
need to remove, or comment out, the following statements:

{
!!! remove this statement for self compile }

prr:
text; {
output code file }

and

{
!!! remove this statement for self compile }

rewrite(prr);
{ open output file }

In the batch file above, this modified file is represented as pcomm.pas,
or "modified" pcom.pas.

All of the source code changes from pcom.pas to pcomm.pas are automated in
cpints.bat.

pint

Pint is more interesting to self compile, since it is running (being interpreted)
on a copy of itself. Unlike the pcom self compile, pint can run a copy of itself
running a copy of itself, etc., to any depth. Of course, each time the interpreter
runs on itself, it slows down orders of magnitude, so it does not take many
levels to make it virtually impossible to run to completion. Ran a copy of pint
running on itself, then interpreting a copy of iso7185pat. The result of the
iso7185pat is then compared to the "gold" standard file.

As with pcom, pint will not self compile without modification. It has the
same issue with predefined header files. Also, pint cannot run on itself unless
its storage requirements are reduced. For example, if the "store"
array, the byte array that is used to contain the program, constants and variables,
is 1 megabyte in length, the copy of pint that is hosted on pint must have a
1 megabyte store minus all of the overhead associated with pint itself.

The windows batch file required to self compile pint is:

cpints.bat

As a result, these are the changes required in pint:

{ !!!
Need to use the small size memory to self compile, otherwise, by

definition,
pint cannot fit into its own memory. }

maxstr
= 2000000; { maximum size of addressing
for program/var }

{maxstr
= 200000;} { maximum size of addressing
for program/var }

and

{
!!! remove this next statement for self compile }

prd,prr
: text;(*prd for read only, prr for write only *)

and

{
!!! remove this next statement for self compile }

reset(prd);

and

{
!!! remove this next statement for self compile }

rewrite(prr);

All these changes were made in the file pintm.pas.

Pint also has to change the way it takes in input files. It cannot read the
intermediate from the input file, because that is reserved for the program to
be run. Instead, it reads the intermediate from the "prd" header file.
The interpreted program can also use the same prd file. The solution is to "stack
up" the intermediate files. The intermediate for pint itself appears first,
followed by the file that is to run under that (iso7185pat). It works because
the intermediate has a command that signals the end of the intermediate file,
"q". The copy of pint that is reading the intermediate code for pint
stops, then the interpreted copy of pint starts and reads in the other part
of the file. This could, in fact, go to any depth.

All of the source code changes from pint.pas to pintm.pas are automated in
cpints.bat.

Self compiled files and sizes

The resulting sizes of the self compiled files are:

pcomm.p5

Storage
areas occupied

=====================================

Program
0-114657 (
114658)

Stack/Heap
114658-1987994 (1873337)

Constants
1987995-2000000 ( 12005)

pintm.p5

Storage
areas occupied

=====================================

Program
0- 56194 (
56195)

Stack/Heap
56195-1993985 (1937791)

Constants
1993986-2000000 ( 6014)

Files for the P5 system

Pascal-P5 is entirely
hosted on sourceforge now. Please see the site for all sources:

This will get the entire P5 file tree and place it into the target directory
"p5".

A note about versioning

My common practice is to "bump" the version numer after any changes
are made to a certified release. The idea is that the new version number will
be the version of the next release to come. This of course can
cause confusion. However, the rule, is: if it is not in the above release list,
then it is a development version.

Getting started

For all versions
see the readme.txt file in the root directory.

Compiling and using P5

The P5 compiler/assembler is much easier than P4 in one respect.
There are no limitations to remember verses ISO 7185 Pascal. If it is legal
Standard Pascal, it will compile and run.

To run P5, use the
following format:

pcom intermediate.int < source.pas

pint intermediate.int program.out

All files must be specified.

This is what the batch file p5.bat given above does.

What is.. and is not in P5

While upgrading P4 to P5, I specifically tried to avoid any
temptation to "improve" the code, such as add functions or features,
or reformat the code to be more presentable, etc. There is a time and place
for that. I simply wanted P5 to be a full language compiler for Pascal, instead
of a subset compiler. The one exception I allowed for is the addition of a routine
that dumps all of the error codes that were used in the source compile along
with their text equivalents. I have found this to be a great improvement on
trying to search the various documents for what error code means what.

Of course, virtually all implementations improve on the original
Pascal, including the original CDC 6000 compiler. The extensions consist of
a combination of features best left defined to a particular implementation,
and also usablity extentions to Pascal in general.

There's a lot more that can be done with P5. However, I have
left that for the P6 project. P6 is the next step for the P-series, and includes
a series of extentions to the base ISO 7185 language.