It's much easier if you treat regular expressions as Finite State
Automata, rather than trying to treat them as a high level language.
Certainly that is the approach I have taken in the past; I simply
built a dynamic FSA from the regular expression and used the FSA to do
the matching.

You would be better off using lex than yacc, although I'm not certain
an unaugumented lex is quite up to the task, since lex is really best
suited for static analysis. I've always had to build my own regular
expression "parser".. As indicated, alternatives pose an interesting
problem, which are straight forward to process using nested FSA's.

1. Define and build an extended FSA interpreter that is sufficient to process
the necessary regular expressions
2. Build a translator ("parser") to convert the regular expression
into a compatible extended FSA

You can also generate code directly from the translator if you are after the
utmost in execution speed, but that limits portability. If execution speed
isn't an issue, it may be interesting to attempt dynamically generating a lex
specification from the regular expression, as an academic exercise.

FSA, Finite State Automata, is sometimes referred to as Finite State Machines,
or FSM.

-- Terry Greyzck

Bruce Ediger <bediger@teal.csn.net> wrote:

>Dr Dobb's Journal, April 1999 carried an article by Brian Kernighan>and Rob Pike titled "Regular Expressions". I read the article, and>I was struck by the beginning sentences of the second-to-last paragraph:>> For regular expressions with parentheses and alternatives,> the implementation must be more sophisticated. One approach> is to compile the regular expression into a parse tree that> captures its grammatical structure.>>How the heck do they do that compilation? I've tried to come up with>a "yacc" grammar for regular expressions based on the definition of>"regular expression" in "Automata and Formal Languages" by Dean>Kelley, ISBN 0-13-497777-7, Prentice Hall, 1995.