Marpa resources

Mon, 12 Sep 2011

Some time ago I released
Marpa::XS 0.010000.
The core Marpa algorithm had already been converted to C,
speeding it up considerably.
Marpa::XS 0.010000 cleans up a lot of code left over from development,
further speeding things up.

What is Marpa?

Marpa is an advance over recursive descent
and yacc.
I hope the Marpa algorithm
will become the standard parser for
problems too
big for regular expressions.

Marpa parses, in linear time, all
classes of grammar that are currently in practical use.
(See Note 1).

The importance of parse-time debugging is often underestimated.
Marpa's parse-time error detection breaks new ground -- Marpa is
fully aware of exactly where in the parse it is at all times,
and of exactly what input it expects and why.
This means parse-time error detection, once a desperate last
resort, now can be used as
a parsing technique in itself.

Marpa is a general BNF parser -- that means if you feed it anything
written in BNF, it "just parses" it.
This includes grammars which are left-recursive, right-recursive and
ambiguous -- even infinitely ambiguous.

Marpa never goes exponential -- worst case, even for highly ambiguous
grammars, is O(n**3), which is considered optimal
(See Note 2).

What is Next with Marpa?

I had planned to skip this announcement and wait to announce
Marpa::XS 0.012000.
I expected
the release of Marpa::XS 0.012000
to be just days away.
And I did produce
its release candidate
almost immediately.
But in the meantime cpantesters was hit with a prolonged outage,
which continues as of this writing.
I've always waited for a few days of cpantesters results on
the release candidate before cutting an official release
and I hope cpantesters will be back online soon.

Thanks to git,
delay in getting one release out is no major obstacle to work
on its successor, and work toward
Marpa::XS 0.014000 is well underway.
At this point (assuming the cpantesters results
will be positive) very little remains to be done before Marpa::XS
can go beta.

Marpa::XS 0.014000 will be about bug fixes.
I will do one or two additional applications, just to
see if any problems surface.
Marpa's current test suite is already quite extensive --
besides the usual specific and regression tests,
it aggressively exercises a full parser for liberal HTML,
and it runs a couple of large tests with a prototype Perl parser.
So, while it may be unwise to make this kind of prediction in public,
I frankly do not expect a new application or two to find
much in the way of bugs.

What I had not done for prior releases
was look proactively for
leaks and other memory issues.
I am now
most of the way through doing that for Marpa::XS 0.014000.
So far, I have
found and fixed one slow leak.
No memory
overruns have turned up.
I will complete the memory debugging and leak detection
before taking Marpa::XS beta.

Limitations and Comparisons

Currently, the major limitation of Marpa::XS is
that it is alpha.
Development is well advanced, but the interface remains
subject to change.
For a comparison of Marpa to other parsers,
which is careful to point out situations
where older parsing algorithms may still be superior,
see the
"Limitations and Comparisons" section
in my announcement of Marpa::XS 0.008000.

Notes

Note 1:
To be specific, Marpa parses any LR-regular grammar in linear time -- O(n).
LR-regular is a vast class of grammars that includes LALR,
LR(k) for all k,
and LL(k) for all k.
This means that Marpa parses,
in linear time,
every grammar parsed by yacc,
recursive descent and regular expressions.

Note 2:
The phrase "considered optimal" elides some
irrelevant bits of theory.
It would be a bit of a
surprise if it is possible to
do general BNF parsing
in O(n), but nobody has proved that it can't be done.
The Valiant algorithm parses general BNF
and is O(n**2.376...) or better.
But Valiant's algorithm is only faster for huge problems,
and for those it
needs a machine with many terabytes of main memory
to deliver on its speed claims.
So it won't be competing with Marpa any time soon.