Understanding of the basics of SPARC assembly language and its run-time
environment.

Ability to build a Pascal-like compiler, written in C, for the SPARC
architecture in the UNIX programming environment.

REQUIREMENTS/POLICIES

Although attendance is not mandatory, students are responsible
for all course materials covered in lectures and any exams given during class
periods. Students that need to make up missing course work must follow the
official Clarkson policies regarding these matters. All students must submit
their own work; the exchange of ideas are encouraged but ultimately the
submitted work must be the student's own. Please refer to the Clarkson
University Regulations for more guidelines on academic integrity and related
matters.

PROJECT

Pascal lex scanner & yacc parser:

adapt the Pascal language description from Appendix A (Dragon book) to
create a lex scanner and yacc parser

some grammar changes include: allowing subprograms to nest within each
other, allow array access on RHS of assignment statements, handling of unary
minus, handling of procedure calls with no arguments, etc.

connect the symbol table module with the yacc parser to detect
identifier problems, such as: redeclaration of names within the same scope,
usage of undeclared names, potential improper usage of names, etc.

semantic analyzer & type checker:

build a syntax (parse) tree representation of parts of the input program

type checking of expressions and statements

see the list of checks that we need
to handle (note the important distinction placed by Pascal with respect to
functions and procedures).

The real Pascal compiler is alive and well and it is called pc.
It's best to not call your executable pc to avoid unintentional
overloading. Use names like yapc (Yet Another Pascal Compiler) or
mypc or related names.

For Linux users: Use yacc or bison (for yacc) and flex (for
lex).The only library flag we need is -lfl (instead of -ll -ly).On
older versions, the yacc/bison parser uses the prefix filename to create its C
and header files. So the command yacc -dv pc.y will create
pc.tab.c and pc.tab.h (and pc.output) instead of y.tab.c,
y.tab.h and y.output.We might not be able to override certain things (like
yyerror, etc.).Unused token attributes will create errors.

OPTIONAL TOPICS

Code optimization on DAGs

Adding (variant) records and pointers to the Pascal source language

Using C for the source language

Using Intel as the underlying architecture

READINGS/NOTES (under construction)

Review of course and project: we are going to build our own Dragon
(compiler) for a Pascal-like programming language (source language) that will
produce SPARC assembly (target language) output. Our compiler (language
translator) will be implemented in C. Core components of the compiler include:

Scanner: module that tokenize the input stream (of characters) into a
stream of tokens (legal lexical elements of the language). This component
can be speficied using the tools of Regular Expressions and the module
itself is a DFA program. We will build our scanner using lex (or
flex under Linux/BSD).

Parser: module that checks whether the stream of tokens is syntactically
correct according to the grammar of the source language. This component can
be specified using the tools of Context-free Grammars and the module itself
is a PDA program. A very important task of the parser is also to build the
parse (syntax) tree representation of the input program. We will build our
parser using yacc (or bison under Linux/BSD)

Symbol Table: module that stores information about names that appear in
the input program. A common data structure used for this module is a hash
table with chaining.

Semantic analyzer: module that performs semantic checks on the parse
tree. Typical checks include undeclared names, type mismatches, illegal
usage of subroutines, and others. An important sub-module called the Type
Checker is usually part of this module.

Code generator: module that converts an intermediate representation of
the input program (syntax tree, in our case) to a sequence of commands in
the target language (SPARC assembly code).

(Theory) Read Chapter 1 (Dragon book).

(Practice) Read and familiarize yourself with C: either pickup Kernighan
& Ritchie's classic book or check the books that I've put on reserve at
the library; I will also give some examples in class. Things to focus about C:
I/O (file or otherwise), pointers and structures, memory allocations,
preprocessor directives, and style. Some UNIX-related things that might
be useful for us: make and makefiles, splitting source files, debugging
tools like gdb, profiling tools, and accessing the manual pages using
man.

(Practice) Read and familiarize yourself with lex and yacc. The O'Reilly
book is quite helpful. Experiment with the calculator program example. Then
write the lex & yacc files for the Pascal grammar given in Appendix A
(Dragon book).

(Theory) Read Chapter 4 (Dragon book) on various method for parsing. The
sequence that we will cover includes LL(1) top-down parsing, SLR(1) bottom-up
parsing, and LR(1) parsing. If time permits, we will cover LALR(1) parsing,
the method that yacc uses. We will skip the material on Operator Precedence
parsing.

(Practice) Build your stack of symbol (hash) tables. Use the hashpjw()
function from Chapter 7 (due to Peter J. Weinberger). Make sure you build a
scaffolding around this module before you connect it to the yacc Pascal parse
(not the lex scanner!).

(Theory) Read bits and pieces of Chapters 5 and 6 (Dragon book). There is
a nice theory about building the abstract syntax trees and defining general
type expressions. We will only require the most basic things about syntax
trees and type expressions but it is good to know the bigger picture.

(Practice) Build your syntax trees, mainly for statements in the grammar
and some of the declarative rules. The main reason we are building these trees
are so that we can perform semantic checks easily and so that the code
generator does not have a heart attack when it is called. The type checking
that has to be done is quite strict (and hence simple) but you might want to
think about some nice structures to capture "types".

(Theory) Read Chapter 7 (Dragon book) on Run-Time Environments. Basically,
we need to get familiar with memory allocations, the run-time stack, the
notion of an activation record, flow of control, and some basic sense of the
von Neumann model.

(Practice) We will look at many examples of some SPARC assembly programs
(especially ones that are generated by the GNU C compiler). We will follow
almost all of the conventions of this open source compiler (most importantly,
its convention for the activation record). The main reason is that we will use
GCC as our assembler!

(Theory) Read Chapter 9 (Dragon book). Our focus will be on a simple
algorithm for generating code from DAG (read: syntax trees) given in Section
10. We will adapt this for our SPARC-based compiler. Details will be covered
in lectures.

enter the Dragon

THE DRAGON PROJECTThe goal of the project is
to build a compiler for a large chunk of the Pascal programming
language that will output a SPARC assembly program. This project will
give you a chance to apply skills you learned from other courses along with
hopefully new ones:

To synchronize the compiler project, make your
compiler take a single input Pascal file (with suffix extension of ".p", say
"main.p") and make it write the SPARC assembly output to file with the same name
except for the suffix (which is changed to ".s", so "main.s"). We will then use
GCC to produce the executable by "gcc main.s". So if your compiler executable is
called yapc then the following sequence should be used
to test all our final executables.

yapc main.pgcc main.sa.out

SUBMISSIONFull
report containing at least the following items:

Specification document (describes the software)

Design document (describes the overall design of the software)

Testing document (describes the testing and verification process
for the software; also bug status report)

User document (user manual for compilation and proper usage of the
software)

Complete source listing (hardcopy and compressed tgz file via
email)

PRESENTATIONDuring the last week, schedule a meeting
to present the online demonstration of the compiler. Be prepared for some extra
testing and questions about your compiler project.