Over the course of the semester we will be implementing a working
interpreter for a simple language. The project will be divided into several
smaller parts, each building on previous parts. In this first section we
will begin with something (hopefully) straightforward and give you a chance to get comfortable with programming.

Scanner Support

One of the simple but important tasks in implementing a compiler or
interpreter is managing the identifiers provided by the users. This
is generally done through a hash table containing identifiers where
space is often managed by storing the strings in a string space. This
allows the lookup and comparison of strings to occur very quickly. For
this work you will implement a StringSpace class and a StringTable class
as described below:

StringSpace - a string space consists of a series of pages of data
each consisting of a series of unique strings. The string space is used
by the compiler whenever a user supplies an identifier (and we will also
use it for reserved words). If the string is new the compiler will add
it to the top page of data and from that point on use a pointer to the
page and the offset on that page as the unique reference to that string
(so that string comparison operators can mostly be avoided). To this end
you should implement this data structure.

A StringSpace object is one
that maintains (internally) a set of data pages on which it stores strings
(as many as will fit on each page). For a real system, the size of the
page would be determined by system parameters (page sizes in memory and/or
on disk). For our work, the page size should be a parameter used to create
the initial StringSpace object (you can set the page size small when
testing your programs and increase the size later when using this module).
A StringSpace provides one function for other users, an insert_string
function that takes a pointer to a string, stores the string on some
page, and returns a reference to where the string is stored in the form
of a related data structure called a StringSpaceEntry. A StringSpaceEntry
maintains (internally) a pointer to the page of data containing the string
and an offset on that page where the string can be found. A StringSpaceEntry
should include a function to return the actual string corresponding to that
entry, as well as functions to compare a new string to that entry (and
possibly) another StringSpaceEntry.

Data in the string space is maintained in pages. Each page has an array
of characters it can use to store 1 or more strings. When a request to
insert a string comes in you should add the string to the top page if there
is enough space or add a new page as the top page and then add the string
to that page.

StringTable - a string table is a hash table used to store strings
encountered by the compiler. For our work we will also use it to recognize
keywords (to make scanning a bit easier). When a string is encountered
during scanning the compiler will lookup that string in the StringTable.
If the string is not already in the table it will be added to the StringTable.
A string is added to the string table by adding the string to the string
space and then using the StringSpaceEntry as the data for the hash table.
Your hash entry should also include a token number. You should plan on
implementing the hash table using a linked list to deal with collisions.
The number of buckets in the hash table should be a parameter used in
creating the StringTable so that you can set it small for testing and make
it larger later. The token number corresponding to a hash table entry for
the moment can be a random value. During scanning we will start by inserting
all of the reserved words into the StringTable and their corresponding
token numbers. Then, when we find an identifier in the program we will
determine if that identifier corresponds to a reserved word by looking it
up in the string table.

Implementation Details - you should implement your code in C++, but make it as general as possible, as you may use something similar in Java eventually. I would recommend you make use of an IDE, one good example is the Eclipse IDE which you can download at
http://www.eclipse.org . If you wish to use this on a windows machine you can download CygWin also at
http://www.cygwin.com . Make sure to go through the installation part where you select the parts and selection gcc, g++, make and gdb. In addition, after setting it up you will need to set your path to include the cygwin bin directory (to allow access to these routines).

Testing - make sure to carefully test your code (as your later code
will depend on this). You should plan to implement test programs to test
both your string space and string table implementations and submit results
from this testing to demonstrate that your program is working.

Writeup - document your code and your testing. For this part of the project you will be working separately. Later we will introduce teams as part of the project.