Earlier, I posted a description of a Regular Expression -> DFA algorithm.
With some slight alterations, this algorithm has been modified somewhat
and clarified so that it now generates DFA's basically by directly
applying the subset construction on the regular expression itself.

The minimilization step is still kept separate. It's possible to merge
this step with the other two, but in contrast to the above, there will not
be any significant gains as a result.

You can get a copy of the software by anonymous ftp at csd4.csd.uwm.edu
in the directory pub/regex. You WILL need an ANSI-C compiler to compile
it. The library functions in <stdlib.h> (particularily realloc()) are
assumed to have the semantics indicated by ANSI. The program regex.c is
the older version, and regex2.c is the newer version.

THE ALGORITHM
Regular Expressions are defined using the following syntax:

The merge operation combines expressions starting with the same symbol,
e.g.:

merge (a 0 | b 1, b 2 | c 3) = a 0 | b (1 | 2) | c 3.

The formal definition is fairly straightforward.

The expression, A, in each term of the form x A in the sum is considered a
state in the DFA. Each newly generated state is reduced to normal form in
the same way, and this in turn may generate more new states. When all the
states that were generated from the top-level expression have been fully
reduced, the result is an DFA representing the original expression. This
process will always halt.

In any implementation of this algorithm, it is ABSOLUTELY essential to
avoid the formation of duplicate expressions. Subexpressions are
therefore hashed and a table lookup precedes the formation of any
expression. Without this, the example above would get into an infinite
loop creating, for instance, an infinite number of copies of E = (x0 x1).

The software I wrote takes these steps to derive an DFA from the regular
expresison E. And it just so happens that for this example, E is the only
"state" in the DFA, so this is the minimal DFA.

Compare with the method described in Berry and Sethi ("From Regular
Expressions to Deterministic Automata" Theoretical Computer Science 1986).

After the conversion, standard techniques are used to minimlaize the DFA.
--