Marpa resources

Sun, 02 May 2010

I've just released Marpa,
a general BNF parser generator, to CPAN.
(A general BNF parser generator is a parser generator which works for any grammar you can express in BNF.)
Previously, CPAN did not have a general BNF parser generator suitable
for practical use.
Then again, neither did much of anybody else.

What was the issue?
General BNF parsers have been around since the 60's.
But the first ones were slow, and the reputation stuck.
It's time to take a new look.
As
Wikipedia says,
Earley's takes
"linear time for almost all LR(k) grammars.
It performs particularly well when the rules are written left-recursively."

LR(k) grammars
include all those parseable by Bison, YACC, recursive descent,
or regular expressions.
Chances are that
Marpa runs in linear time on your grammar, or can
easily be made to do so.
If Marpa doesn't run your grammar in linear time,
chances are that the alternatives you are considering
do not run your grammar at all.

What about Wikipedia's "almost all"?
Wikipedia is referring to right-recursive grammars.
Specifically, the grammars that
Earley's (and Marpa) run more slowly on are the right-recursive grammars.
(Marpa is quadratic on unambiguous right-recursive grammars.)

In practical uses, left recursion (which Marpa handles well)
is much more important than right recursion.
Right recursion is usually easy to eliminate.
Three simple strategies take care of most cases.
At least one of these strategies for eliminating right recursion
probably works on your grammar.

First and most obviously, you can change the right recursion to a left recursion.
If the semantics are those of a simple sequence, this works fine.

Second, you can eliminate right recursion by
modifying the language so that it explicitly terminates the construct.
This is why English has periods at the end of sentences -- they eliminate
right recursion.
If the right recursion involves the whole file, you can eliminate it
with an explicit end-of-file.

Third, you can use look-ahead.
Marpa makes it convenient for its lexers to use look-ahead --
that is how
Marpa's HTML parser works.

It seems possible to modify Marpa to run in linear time on right-recursive grammars
as well.
There are suggestions in the academic literature on how to do this.
But I try to keep the focus with Marpa practical.
And so far I've been unable to find a grammar of practical interest
where it's not convenient to simply
eliminate the right recursion.