The purpose of this case study is to give an example of a compiler/interpreter front-end written in C using Lex and Yacc. An interpreter is used since it allows a working program to be created with minimal extra effort (after the construction of the front-end). This code could be developed into a compiler by replacing the last phase with a compiler back-end.

The code is shown in an order which underlines the processes of creating a compiler, so the same file will be shown multiple times as it is developed.

The case study develops an interpreter for Very Tiny Basic, which is specified in the Case Study 1 section of the book.

The following steps shall be taken to complete the interpreter:

Lexical Analysis

Syntax Analysis

Lexical Analysis with Semantics

Abstract Syntax Trees

Generating Abstract Syntax Trees

Interpreting

Many important features of a useful compiler/interpreter have been left out, for brevity and simplicity, including:

Dealing with Errors

Optimization

Also some processes do not apply to such a simple language as Very Tiny Basic, for example Name Tables do not apply since only single characters are used to name variables.

The first draft of the lex file identifies different tokens by returning a certain value in the associated C action. For keywords and operators this is simple, however identifiers, values and comments are trickier.

You may wonder where all those values we returned are coming from. They will be created by Yacc grammar file when it is processed.

There are some differences from the Very Tiny Basic - Specification in Case Study 1, for instance MULT and DIV in place of MULDIV. This is because we need to know the difference between the two. LineNumber and WholeNumber are lexically identical, and so cannot be separated at this time. Defining the use and category of tokens is left until the next stage.

In this version of the lexer, the header file generated by yacc/bison is included. The header file defines the return values and the union that is used to store the semantic values of tokens. This was created according to the %token declarations and the %union part of the grammar file. The lexer now extracts values from some types of tokens, and store the values in the yylval union.

Abstract syntax trees are an Intermediate Representation of the code that are created in memory using data structures. The grammatical structure of the language, which has already been defined and has been written down as a YACC grammar file, is translated into a tree structure. Using YACC & C, this means that most grammar rules and tokens become nodes. Comments are an example of what is not put in the tree.

The grouping of operands is clear within the structure, so tokens such as parentheses do not have to be present in the tree. The same applies to tokens which end blocks of code. This is possible because the rules in the grammar file can use these tokens create the tree in the correct shape.

Illustration of grouping in abstract syntax trees

(1+3)*4 1+3*4
* +
/ \ / \
4 + 1 *
/ \ / \
1 3 3 4

In this interpreter the Primary/Secondary expression structures could be discarded by collapsing them. This would add complexity to the code, so it is currently not implemented. Rem statements are also kept (without the comment text) since the definition of VTB implies that they are a valid target for Goto. In fact, a goto to a non-existent line is undefined, so this interpreter will issue an error.

Since we are working with standard *nix tools and the normal build system used on *nix is make, it is useful to write a Makefile for the interpreter. Keep in mind that most compilers/interpreters are very large and need a more advanced build system than this example. They may require CVS, autoconf and many makefiles distributed across different directories. Since this one only uses five files, it is quite trivial.

VTB.y - Version 2 The lexer is now giving values for tokens and the abstract syntax tree structure has been written. Next the grammar file is updated to construct the trees from what the rules and semantic values. All the tree node types are added to the union declaration. Rules must be given types and return the correct type.