What is Lex? What is Yacc?

What is Lex?

Lex is officially known as a "Lexical Analyser".

It's main job is to break up an input stream into
more usable elements.

Or in, other words, to identify the "interesting bits" in
a text file.

For example, if you are writing a compiler for the C
programming language, the symbols { } ( ) ;
all have significance on their own. The letter a
usually appears as part of a keyword or variable name, and
is not interesting on it's own. Instead, we are interested in
the whole word. Spaces and newlines are completely uninteresting,
and we want to ignore them completely,
unless they appear within quotes "like this"

All of these things are handled by the Lexical Analyser.

What is Yacc?

Yacc is officially known as a "parser".

It's job is to analyse the structure of the input stream, and
operate of the "big picture".

In the course of it's normal work, the parser also
verifies that the input is syntactically sound.

Consider again the example of a C-compiler.
In the C-language, a word can be a function name or a variable, depending
on whether it is followed by a ( or a =
There should be exactly one } for each {
in the program.

YACC stands for "Yet Another Compiler Compiler".
This is because this kind of analysis of text files
is normally associated with writing compilers.

However, as we will see, it can be applied to almost
any situation where text-based input is being used.

For example, a C program may contain something like:

{
int int;
int = 33;
printf("int: %d\n",int);
}

In this case, the lexical analyser would have broken the
input sream into a series of "tokens", like this:

{
int
int
;
int
=
33
;
printf
(
"int: %d\n"
,
int
)
;
}

Note that the lexical analyser has already determined that
where the keyword int appears within quotes,
it is really just part of a litteral string.
It is up to the parser to decide if the token int
is being used as a keyword or variable. Or it may
choose to reject the use of the name int
as a variable name.
The parser also ensures that each statement ends with a ;
and that the brackets balance.

Flex and Bison

Lex and Yacc are part of BSD Unix. GNU has it's own, enhanced, versions
called Flex and Bison. I'll keep referring to "Lex" and "Yacc",
but you can use Flex and Bison as "drop-in" replacements in most cases.
In fact, the additional features of Flex and Bison make them
an irresistable choice.