These tools are freely available in
the GNU/Linux world, usually free, in some cases licensed under the GPL.
It should be pointed out that lexical and syntactic analysis are two of
the primary jobs to be performed by a language translator. A lexer and
parser built with the above tools do not automatically accomplish a third
crucial task, target code generation. However, these tools provide the
programmer with convenient 'hooks' for incorporating target code generation.

Later in this series of articles
I hope to introduce two of these tools, JFlex and CUP, in a tutorial setting.
I will enlist the help of several students in my compiler design course
as coauthors. Ultimately, I hope to persuade those unfamiliar with these
tools that they are very practical. I've chosen JFlex and CUP because they
produce java code and it is high time for me to learn some java. This unfamiliarity
with java will also provide me with a scapegoat when I do something stupid
- I can blame it on that unfamiliarity.

This first article will provide background
for the series. The next article, which should follow fairly soon, will
show how/where to get these tools, give a very specific installation scenario,
and produce a simple application as a development example. A third article
will give a practical real world example (to be described below). If the
third article becomes unwieldy, it may be broken into parts.

The Lexical Analyzer (a.k.a. 'lexer'
or 'scanner')

Language translation converts source code from some language into target
code in some other language. The 'traditional' compiler may convert source
code into assembly language or even machine code - although the later articles
in this series will focus on other targets than these. The first task of
language translation is akin to examining an English essay to make sure
that the words are spelled correctly. The lexer performs this job on our
source code by recognizing a series of contiguous symbols as valid or not
e.g. the lexer might

recognise 'while' as a keyword

recognize '47452' as a decimal integer literal

complain that 'fr%$glp' is not recognized

The lexer is analogous to a spelling checker for a source program.

A utility such as JFlex builds a lexer from a specification file the
programmer writes to define the 'words' (lexical tokens) in the desired
language. Let's say the programmer defines a new langauge called pronto
and writes a file 'pronto.flex' which defines valid lexical tokens for
pronto. Then the command line operation 'JFlex pronto.flex' will
produce a java version of a lexical analyzer, say, "Lexer.java'.

In its most primitive deployment, the lexer merely indicates that the
source file consists of all valid lexical tokens or not - a boolean yes
or no. The family of lexers under discussion are built to do more and,
in particular, can cooperate with the parser (to be discussed under the
next heading). In the typical application the lexer is, in fact, called
by the parser and the lexer can do these jobs:

recognize a lexical token and return to the parser an identification code
indicating the token type

pass to the parser related information e.g. the actual string recognized,
or the value of an integer literal

perform other programmer coded actions upon recognition of a token

The first of the three items above allows the lexer to support the parser's
central task, syntactic analysis. The other two items are especially useful
in helping the parser in ultimately generating target code.

The Syntactic Analyzer (a.k.a. the 'parser')

Continuing the analogy that began the previous section, just as the lexer
checks words for spelling, the parser examines the source to assure that
the 'words' are arranged in valid grammatical constructs. For example,
for some programmer's new language the lexer might pass these six valid
tokens to the parser:
{ if + while ] } - the lexer only worries about token validity,
not the arrangement.
However, the parser sees the arrangement '{ if + while ] }' as invalid.
Just as the lexer is a source code spelling checker, the parser is a grammar
checker.

[ Note: The compilerati will realize that the lexer is actually checking
the source code validity against a very simple 'regular grammar' specification
and that the parser is checking the source code against a less simple 'context
free grammar' specification - as defined in the Chomsky hierarchy. Typical
compiler design books discuss the theory at length.]

A utility such as CUP builds a parser from a specification file the
programmer writes to define the syntactic structure in the desired language.
For the fictitious new language called pronto, the programmer might
write the specification file as 'pronto.cup'. Then the command line
operation 'java java_cup.Main < pronto.cup' will produce several files
one of which is a java version of the desired syntactic analyzer, "parser.java'.

In its most primitive deployment, the parser indicates that the source
file is either grammatically correct or not - again, a boolean yes or no.
The family of parsers under discussion can do an additional task - whenever
a valid grammatical construct is found, perform any code that the programmer
cares to encode. This 'hook' is typically used to support target code generation
in two execution styles:
generated code written to a file to be executed later
generated code to be executed while the parser operates

Application Specific Languages

The compiler construction tools under discussion can be used to develop
a full-blown language translator e.g. for C, Pascal, FORTRAN, Perl, etc.
These would comprise major development projects. Here I'd like to discuss
translators for 'Application Specific Languages', typically a more modest
project, yet quite useful. I'll define an 'Application Specific Language'
(ASL) operationally, by means of two examples.

Example 1 - A generalized industrial control language

Let's say Fred works for a company that produces industrial controllers,
which are driven by a computer. When Fred is hired, the company has already
developed and deployed a powerful, general pupose control language based
on generalized, parallel state machines. However, customers must become
programmers to use the controller; customers formally trained as chemical
engineers, mechanical engineers, technicians etc. with little desire or
time to learn a new general purpose programming language. The product is
very general pupose, useful in many niche industries - automotive, petroleum,
logging mills, satellite control, etc.

Fred has been hired to put a front end on the language for every one
of the exploitable niche markets. In each case, the front end is to be
tailored to the terminology used by the niche market customer and to be
easy to use. The front end might be of the 'fill in the blanks' variety,
a GUI, whatever. The front ends are really new languages all with the same
target language (the general purpose control langauge). Each front end
exemplifies an ASL. By using the compiler construction tools (e.g.
JFlex and CUP), Fred capitalizes on several benefits including:

the capability of checking the customer's source code for lexical and syntactic
errors and returning meaningful error messages

the development of each ASL will have certain similarities

the specification files for the lexer and parser are highly maintainable

Example 2 - A generalized Fuzzy Logic based decision package

Fuzzy Logic has proved useful, not only in its traditional role in industrial
control, but also in a decision making role. It can be used to play the
stock market (and lose money more elegantly?), to choose from among a corporation's
research or marketing strategies, to aid in avalanche prediction, etc.

Let's suppose that Fred's significant other, Sally, has programmed a
general pupose Fuzzy Logic decision maker. To apply it to different problems
it is initialized from appropriately different initialization data files.
Sally markets this product to various niches, but finds former customers
a constant burden on her time. The problem is really inherent in the way
Fuzzy Logic works. The customer is the expert in his/her particular problem
space. A Fuzzy Logic model is initialized by incorporating the expertise
of the user. The user gains more expertise as the model is used and will
constantly want to tweak the model. The crux of Sally's problem is that
the initialization file must be prepared with great programming care. The
customers are not programmers and easily make mistakes (most likely syntactic)
in preparing such a data file. They always run into problems, call on Sally
for help, and expect her to do it at a fairly low margin. She must respond
to maintain her reputation and, hence, her financila success.

Her solution is to make an 'idiot-proof' front end that will automate
writing the initialization data file. The front end is tailored to the
niche customer's terminology and problem space. The front end is an ASL
with the initialization data file as the target language. The translator
can be written with the help of the compiler construction tools, just as
done by Fred for the industrial control scenario. The translator guarantees
that the lexical and syntactic content of the data file is correct.

If the front end is cleverly defined the customer will find it useful.
Note that the customer is an expert on the problem semantics, where programmer
Sally would be weakest. The customer will solve the semantic problems,
and the ASL translator will avoid the lexical and syntactic problems related
to the initialization data file.

Upshot

The two preceding examples have an obvious common thread. It should
be emphasized that in designing the front end ASL's, Fred and Sally must
interact strongly with the niche customers. After all, the ASL is to be
useful to a programming novice who, at the same time, has expertise in
the problem space. If this interaction fails, the ASL will not be well
received and may fail in its intended market.