Due: April 16, 2013

Objective

In this lab, you will add variables to your compiler's source
programming language, with the ability to declare variables, assign
values to them, and use them in expressions.

Starting point

Copy your c2 project from the previous lab as c3, because your new
compiler will build on your work from the prior lab. If you had
difficulty with the previous lab, you should talk with me to make sure
you have a solid foundation to build on.

Basic declaration and assignment statements

Initially you should add just simple declaration statements and
assignment statements to your language. A declaration statement at
this point should contain only a single variable and not have any
initializer. An example would be

int exampleVariable1;

An assignment statement assigns the value of an expression to a
variable. An example would be

fellerNumber = 10 + 7;

In order to add these features to the source language, you will
need to make additions to the lexical analyzer and parser, as well as
adding two new subclasses of Stmt for declaration
statements and assignment statements.

The toLLVM method for a declaration statement should
return a string of LLVM assembly code that allocates stack memory for the
variable and makes an LLVM name point to that memory space, as in
this example:

%t0 = alloca i32

Human readers of the LLVM assembly code (such as yourself) might
benefit if you added a comment containing the source variable name.
A comment in LLVM assembly language starts with
; and extends through the end of the line.

The toLLVM method for a declaration
statement should have one additional effect. The method should insert the declared name into
a symbol table along with the LLVM name that was allocated to point to that
variable's storage. (So, for example, the symbol table might
indicate that exampleVariable1 has been
allocated %t0 as its LLVM pointer.)

Because the symbol table is used to communicate between
declarations and uses, such as in assignment statements, it needs to
be accessible to all these AST nodes. The simplest reasonable approach to the symbol table would to use a
separate class, SymbolTable, which you define with just
static methods (for declaring a variable and for looking up its
LLVM counterpart) and a static variable that holds the table itself. Using
Java generics, a declaration for the table might show that
it maps strings (names of source-code variables) to strings (names of
LLVM pointers):

(This presumes you have imported the names
Map and HashMap from java.util.)

You can temporarily ignore the possibilities that a program might
declare the same variable twice or might use a variable (in an
assignment statement) without a previous declaration.

For each assignment statement, you will want to generate an
LLVM store instruction. For example,

store i32 %t3, i32* %t0

would store the value named %t3 into the memory
location pointed to by the pointer %t0.

You should test your compiler at this stage in its evolution. (In
fact, you could even have tested with just declarations, before adding
assignment statements.) Because you have not yet added the ability to
use variables in expressions, you will need to look at the assembly
language output (in testProgram.ll) to see if assignment
statements are correctly storing their expressions' values into the
locations pointed to by the variables' pointers.

The remaining portions of this lab are independent of each other
and so can be done in any order. You should test your work as you
complete each section.

Allowing variables to serve as expressions

Having paused for testing at the first point where the language was
at all usable, you can now proceed to make the language actually
useful. Modify the grammar to allow a variable to serve as an
expression and create a corresponding new subclass of
Expr. The LLVM code will make use of
the load instruction, as shown here:

%t4 = load i32* %t0

You should now be able to test using programs that
declare variables, assign values to them, and then make use of the
variables in further computations.

Handling errors

Now you need to cope with the possibility that a program being
compiled might contain repeated declarations for the same variable and
might contain uses of undeclared variables. You should report an error
message for each of these situations.

Error messages should be specific, which means they should include
the name of the variable in question and also the position within the
input at which the variable was redeclared or used undeclared. To get
the position, you will need to pass additional information from the
parser into the AST node constructor. In the parser, if a grammar
production refers to a terminal symbol using something like

IDENTIFIER:name

then in the accompanying Java action code, you can not only make
use of name to refer to
the attribute of the IDENTIFIER token, you can also use
nameleft to refer to the character position at which that
token begins.

Reporting up to five error messages per run of your compiler is
reasonable. After the fifth error message, the compilation should be
terminated without checking for any further errors. If any errors at
all are found (even less than five), the compilation should terminate
without producing any assembly output and with a non-zero exit code,
which will cause ant to stop without running
subsequent steps. These features will be easiest to add if you
provide a central error-reporting method, perhaps a static method in
the Compiler class.

If a variable is reported as undeclared, the user will generally
not be interested in seeing further error messages regarding that same
variable. The easiest way to suppress further messages for the same
variable is to pretend the variable is declared.

Making the syntax more flexible

Rather than only allowing variables to be declared one at a time,
and rather than forcing initialization to take place in separate
assignment statements, it would be nice to support a syntax like

int x, fellerNumber = 10 + 7, answer = fellerNumber + 25;

This change can be made purely in your lexical analyzer and parser;
you can generate the same AST as if the declaration statement had been
broken apart into