Marpa resources

Sun, 11 Dec 2011

I am very happy to announce that
the latest release of Marpa::XS
is a release candidate for the first full release,
Marpa::XS 1.000000.
Most user's experience with the previous beta releases
seems to have been trouble-free.
The one significant issue that was identified
was a failure to properly evaluate null symbols under
an unusual combination of circumstances.
This problem
(a one line error in the C rewrite of the parse engine)
is fixed in this release.
Unusual as the issue is,
when it does occur it results in a parse failure,
so that
I recommend that all users of Marpa::XS upgrade to
the latest release.

Marpa::XS is being kept stable.
Bug fixes, even of minor and cosmetic bugs,
will be made, as
will changes that improve maintainability.
But no new features will be added.
Interface changes will be especially avoided.

What is Marpa?

Marpa is an advance over recursive descent
and yacc.
I hope the Marpa algorithm
will become the standard parser for
problems too
big for regular expressions.

Marpa parses,
in linear time,
those
classes of grammar that are currently in practical use.

The importance of parse-time debugging is often underestimated.
Marpa's parse-time error detection breaks new ground -- Marpa is
fully aware of exactly where in the parse it is at all times,
and of exactly what input it expects and why.
This means parse-time error detection, once a desperate last
resort, now can be used as
a parsing technique in itself.

Marpa is a general BNF parser -- that means if you feed it anything
written in BNF, it "just parses" it.
This includes grammars which are left-recursive, right-recursive and
ambiguous -- even infinitely ambiguous.

Marpa never goes exponential -- worst case, even for highly ambiguous
grammars, is O(n**3), which is
considered optimal.

In some previous Marpa::XS releases, the
documentation, while part of the distribution,
did not install automatically.
As of this release, that problem is fixed.
The documentation now
installs, as it should,
along with the rest of Marpa::XS.

No interface has been deprecated since
Marpa::XS went beta -- the interface has remained stable.
But many interfaces deprecated BEFORE Marpa
went beta were used in the test suite.
To make the test suite more useful for readers,
I eliminated deprecated practices
except in code whose purpose it is
to test that deprecated practice.
Where tests continue to use a deprecated practice,
comments explicitly point this out.

What is next with Marpa?

Based on the feedback,
I have confidence that Marpa::XS have been extensively
used and found reliable.
With the fixes for this release,
I expect that Marpa::XS can be taken out
of beta and into a full 1.000000 release shortly.

Development of new features for
Marpa continues, but in another distribution:
Marpa::R2.
This isolates Marpa::XS users from the accidental changes
and bugs that can be the side effect of active development.

Limitations

Marpa::XS
installation requires a C compiler,
and many of the GNU utilities and libraries as well.
Marpa::XS has been tested on a wide variety of POSIX systems.
In theory Marpa::XS is NOT restricted to POSIX systems --
all the tools it uses have Windows versions, for example.
However, Marpa::XS has yet to be installed
on a non-POSIX system.

Notes

"in linear time":
To be specific, Marpa parses any LR-regular grammar in linear time -- O(n).
LR-regular is a vast class of grammars that includes LALR,
LR(k) for all k,
and LL(k) for all k.
This means that Marpa parses,
in linear time,
every class of grammar
parsed by yacc,
recursive descent and regular expressions.

"considered optimal":
The phrase "considered optimal" elides some
irrelevant bits of theory.
It would be a bit of a
surprise if it is possible to
do general BNF parsing
in O(n), but nobody has proved that it can't be done.
The Valiant algorithm parses general BNF
and is O(n**2.373...) or better.
But Valiant's algorithm is only faster for huge problems,
and for those it
needs a machine with many terabytes of main memory
to deliver on its speed claims.
So it won't be competing with Marpa any time soon.

In internalese,
reproducing the Jirotka issue requires,
at a minimum,
a rule which creates a Leo chain with multiple rules
and which has a certain pattern of nullables.
The bug itself boiled down to using the wrong variable
at a single point in the code for handling
nullables and Leo chains.
This incorrect variable often had either
the same value as the correct variable,
and or a value which had the same effect.
This meant that,
even if a test exercised
Leo chains with multiple rules,
and even if the nullables fell into the right positions,
that test would not necessary surface the Jirotka issue.

My actual fix was not just that one line,
but included some refactoring.
The refactoring made the logic clearer,
so that I could examine the code carefully,
think the issue through,
and assure myself
that no other problems lurked
in nearby or associated logic.

A bug report like
Tomáš Jirotka's
is a good sign --
it indicates that
Marpa::XS
has been very thoroughly
exercised.
This bug report also shows off a nice property
of Marpa.
Since Marpa parses every BNF grammar,
it is very easy to determine whether Marpa's failure to
parse a grammar is a bug or not --
if Marpa does not parse it, it's a bug.
Most parsers in current use
parse only a subset of the
grammars that can be written in BNF,
and whether
a particular grammar is one of those in the subset
can be very difficult to determine.

"GNU utilities and libraries":
These dependences can be an inconvenience, I admit, but
the alternative is installing
my attempt to portably re-create
all the things the GNU people have developed.
I think that it is clear that the GNU software is the easier
and more reliable alternative.

If you browse the package, you may see that it uses TeX as well.
TeX is ONLY needed is you are working on libmarpa,
the highly mathematical, low-level core library that provides
the parse engine.
To do this, you'd need to have studied a lot of the mathematics
of parsing -- and you'd understand why I feel forced to do the
documentation in TeX.
All the non-mathematical parts are either in Perl, or in C code
which can be read and changed on systems which do not have TeX
installed.