After reading the spec for ECMAScript regular expressions I put together some code in C#4 that produces immutable, functionally pure character matcher sequences. Here is a simple example that will match a social security number.

morphine wrote:I have to ask, though: did you have a very good reason for writing your own regexp parser, or you did it as an exercise?

Well to be a bit pedantic my code does not do any parsing. It fluently produces a series of continuations (functions) that test portions of an string.As to your question, I am trying to learn and develop new techniques for lexical analysis that involve pure functional programming. My end goal is to mold this in to a series of continuations that will produce tokens that I can annotate with extra information such as errors, line number, and character column. I am also trying to provehow simple regular expressions can be implemented (Prove to myself I mean.).

That being said I have also written a regexp parser/compiler that I integrated into the ECMAScript (JavaScript/JScript/ActionScript) implementation I am developing.

Chaospandion wrote:Well to be a bit pedantic my code does not do any parsing. It fluently produces a series of continuations (functions) that test portions of an string.As to your question, I am trying to learn and develop new techniques for lexical analysis that involve pure functional programming. My end goal is to mold this in to a series of continuations that will produce tokens that I can annotate with extra information such as errors, line number, and character column. I am also trying to provehow simple regular expressions can be implemented (Prove to myself I mean.).

I haven't had a chance to look at your code, but from this description above -- evaluating a regular expression through a series of continuations -- it sounds like you're using the theory of "regular expression derivatives", either by chance or by design. From the link above:

Modern times need non-blocking lexers. Whenever programmers write non-blocking applications, they often end up re-inventing hand-rolled continuations. Fortunately, using a concept called "the derivative of regular expressions," a programmer can roll their own non-blocking version of lex that hides all of that machinery. In fact, derivatives make it possible to do this in just a few hundred lines of code!

So, what's the insight that gives us this engineering win? The continuation of a finite state machine is equivalent to the derivative of the regular expression from which it came.