What good is a cpu without an assembler? In this chapter we use flex and bison (i.e. lex and yacc) to make an assembler for our cpu. The assembler generates a binary file that can be loaded by the verilog $readmemh statement, or in a synthesized FPGA we would use the binary file to initialize the instruction memory inside the FPGAs internal memory blocks.

Flex and Bison

Flex and bison are used to create many compilers for many languages and was also used to create the GCC compiler collection. Flex and Bison are the modern implementations of Lex and Yacc which was originally created for unix systems. Yacc is an acronym for "Yet Another Compiler Compiler", which whimsically states it's purpose.

Text lexer/scanner using Flex

We begin creating the lexer/scanner for our assembler by coding a flex input file. The term scanner and lexer can be used interchangeably. This file defines how the input assember program as text is broken down into tokens, identifiers and constants. The main section of this file lists a collection of regular expressions to match against the input text and a corresponding action to generate a token it's value to pass on to the parser. The action is coded as regular C/C++ but typically only a line or two of code is required for each action. This lexer input file is used by the flex program at compile time to generate a C file containing your specific text input scanner. In fact, we add an extra step in the makefile to automatically regenerate our scanner C file any time the lexer file is changed. Also, the scanner C file is considered an intermediate file and is not required to be included during distribution, issuing a 'make clean' also deletes this file.

Here are the regular expressions and actions that implement our first assembler. In the top section, before the %% line, we define some simple regular expressions (shortcuts) for common text sequences we expect to encounter. After which, we define the actual scanner tokens and the associated action to execute for each occurrence. Some of the actions simply return a constant integer defined elsewhere that specify the token that was encountered. Others, such as the definition of an identifier and string, also pass a value to the parser via the yylval variable.

Above we simply declare some regular expression shortcuts for text we expect to encounter. Below are the actual scanner expressions and associated actions. These regular expressions are separated by a %% line in the source file.