Marpa resources

Wed, 17 Nov 2010

I'm converting Marpa
to XS and have started to use Cweb.
Cweb is
the C version of the "literate programming" system pioneered
by Don Knuth.
I'm pleasantly surprised by it.
Cweb adds fun to the programming experience
and is helping in more ways than the phrase
"literate programming" would suggest.

One very important feature of Cweb
is something that would seem to be a nuisance,
or at best an implementation detail.
The .w file
which contains the Cweb is now the "source".
The
.c and
.h files are now "built files".
I am no longer working with the C language
"translation units".
("Translation unit"
is standards-committee-speak for
the text which is
directly input into the C compiler.
The term is used in attempt to distance the standard from
file conventions.)

Moving one step back from the "translation unit"
separates the presentation of the code
from the issues of linkage and scope.
Over the years,
I'd gotten used to
organizing my code
around the visibility rules for the compiler and
the linker.
With Cweb,
I can
lay out C code
in the same way that I think about it.
It surprises me how liberating this is.

Suppose we are adding an element to a C structure.
Typically required might be:

A typedef for a type
that is particularly complex.

The entries in the struct.

Initialization of these entries.

"Destruction" of those entries: freeing any memory or other resources
they use.

Definition of the function bodies for mutators, accessors, etc.

Public prototypes for some of the functions.

Private prototypes for other functions.

Often, each of these would be put at a different location.
But all these bits of code are written and debugged together.
If you ever want to change the data structures,
all these bits of code would have to be tracked down again.
True, skill at picking good names
and at performing searches on
the source can make this sort of thing tractable.
But it is also true that some of the programming skills
we've developed over the years could
aptly be called symptoms.

Here is some C logic which, while very small, nonetheless has 6 of the 7
components listed above.
In Cweb I was able to put them all together. Here's the bottom of one page of my "woven" Cweb code:

... and here is the top of the next:

For some readers, these samples may adequately illustrate my point.
For those curious about the code, I'll close with a few
explanations. The code, as I said, is for the XS version of Marpa.
For those not familiar, Marpa
is a general BNF parser generator -- it parses from any grammar that you can write in BNF.
If the grammar is of any of the kinds currently in practical use
(yacc, LR(k), LALR, LL, recursive descent, etc.), this parsing is in linear
time.

Marpa uses Perl callbacks, and so the XS version must be able to call back
to Perl closures.
So where is all the Perl logic in my example?

For the XS conversion, I'm separating the code into three layers:

libmarpa
is a "pure C" library, which implements the core
of the Marpa algorithm.
libmarpa
is agnostic about whether it is called from Perl,
from another high level language, and even from other C code.

There's a "glue" layer to tie the Perl code to libmarpa.
This is in Perl's XS language.
XS can be thought of as a C dialect.
Special preprocessing converts XS into C code,
which is then compiled.

I am only using Cweb for libmarpa.
(Cweb only understands C language.)

Since libmarpa has
no Perl-specific logic, its role in dealing with Perl callbacks is very simple.
libmarpa needs code to store the callback (which from its point
of view is just a C function pointer),
and some other code to perform the callback.
C does not have closures, so to implement a poor man's version
of these, the callback is allowed an argment,
which can be examined and set.
It comes out to a couple of dozen lines of code,
the majority of which are declarations and data definitions.

One thing is missing from the example -- "destructor" logic.
The pointers in the example do not "own" the things that they point to,
so there is no need to deallocate anything.
When allocation and deallocation are separate, it is easy to forget
whether you omitted deallocation logic because it was unnecessary,
or because you forgot.
Using Cweb, I always put allocation and deallocation together.
Perhaps that's not enough to make memory leaks a thing of the past,
but it certainly makes life easier.