Libmarpa: The Marpa low-level library

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the “Software”),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.

1 About this document

1.1 How to read this document

This is essentially a reference document,
but its early chapters lay out concepts
essential to the others.
Readers will usually want to read the
chapters up and including
Introduction to the external interface
in order.
Otherwise, they should follow their interests.

1.2 Prerequisites

This document is very far from self-contained.
It assumes the following:

The reader knows the C programming language
at least well
enough to understand function prototypes and return values.

The reader
has read the documents for one of Libmarpa’s upper layers.
As of this writing, the only such layer is Marpa::R2
or Marpa::R3,
in Perl.

1.3 Parsing theory

This document assumes some acquaintance
with parsing theory.
The reader’s
level of knowledge is probably adequate
if he can
answer the following questions,
either immediately or after a little reflection.

What is a BNF rule?

What is a Marpa sequence rule?

As a reminder,
Marpa’s sequence rules are implemented
as left recursions.
What does that mean?

Take a Marpa sequence rule at random.
What does it look like when rewritten in BNF?

What does the sequence look like when rewritten
in BNF as a right-recursion?

2 About Libmarpa

Libmarpa implements the Marpa parsing algorithm.
Marpa is named
after the legendary 11th century Tibetan translator,
Marpa Lotsawa.
In creating Marpa,
I depended heavily on previous work by Jay Earley,
Joop Leo,
John Aycock and Nigel Horspool.

Libmarpa implements the entire Marpa algorithm.
This library does
the necessary grammar preprocessing, recognizes the input,
and produces parse trees.
It also supports the ordering, iteration
and evaluation of the parse
trees.

Libmarpa is very low-level.
For example, it has no strings.
Rules, symbols, and token values are all represented
by integers.
This, of course, will not suffice for many applications.
Users will very often want
names for the symbols, non-integer values for
tokens, or both.
Typically, applications will use arrays to
translate Libmarpa’s integer ID’s to strings or other
values as required.

Libmarpa also does not implement most of the semantics.
Libmarpa does have an evaluator (called a “valuator”),
but it does not
manipulate the stack directly.
Instead, Libmarpa,
based on its traversal of the parse tree,
passes optimized step by step stack manipulation
instructions to the upper layer.
These instructions indicate the token or rule involved,
and the proper location for the true token value or
the result of the rule evaluation.
For rule evaluations, the instructions include the stack location
of the arguments.

Marpa requires most semantics to be
implemented in the application.
This allows the application total flexibility.
It also puts
the application is in a much better position to prevent errors,
to catch errors at runtime or,
failing all else,
to successfully debug the logic.

3 Architecture

3.1 Major objects

The classes of
Libmarpa’s object system fall into two types:
major and numbered.
These are the Libmarpa’s major classes,
in sequence.

Configuration:
A configuration object is
a thread-safe way to hold configuration variables,
as well as the return code from failed attempts
to create grammar objects.

Grammar:
A grammar object contains rules and symbols,
with their properties.

Recognizer:
A recognizer object reads input.

Bocage:
A bocage object is a collection of
parse trees, as found by a recognizer.
Bocages are similar to parse forests.

Ordering:
An ordering object
is an ordering of the trees
in a bocage.

Tree:
A tree object is a bocage iterator.

Value:
A value object is a tree iterator.
Iteration of tree using a value object
produces “steps”.
These “steps” are
instructions to
the application on how
to evaluate the semantics,
and how to manipulate the stack.

The major objects have one letter abbreviations,
which are used frequently.
These are, in the standard sequence,

Configuration: C

Grammar: G

Recognizer: R

Bocage: B

Ordering: O

Tree: T

Value: V

3.2 Time objects

All of Libmarpa’s major classes,
except the configuration class,
are “time” classes.
Except for objects in the grammar class,
all time objects are created from another time
object.
Each time object is created from a time object
of the class before it in the sequence.
A recognizer cannot be created without a precomputed grammar;
a bocage cannot be created without a recognizer;
and so on.

When one time object is used to create a second
time object,
the first time object is the parent object
and the second time object is the child object.
For example, when a bocage is created from a
recognizer,
the recognizer is the parent object,
and the bocage is the child object.

Grammars have no parent object.
Every other time object has exactly one parent object.
Value objects have no child objects.
All other time objects can have any number of children,
from zero up to a number determined by memory or
some other machine-determined limit.

Every time object has a base grammar.
A grammar object is its own base grammar.
The base grammar of a recognizer is the grammar
that it was created with.
The base grammar of any other time object is the base
grammar of its parent object.
For example,
the base grammar of a bocage is the base
grammar of the recognizer that it was created
with.

3.3 Reference counting

Every object in a “time” class
has its own, distinct, lifetime,
which is controlled by the object’s reference count.
Reference counting follows the usual practice.
Contexts which take a share of the
“ownership” of an object
increase the reference count by 1.
When a context relinquishes its share of
the ownership of an object, it decreases the reference
count by 1.

Each class of time object has a “ref” and an “unref”
method, to be used by those contexts which need to
explicitly increment and decrement the reference count.
For example, the “ref” method for the grammar class is
marpa_g_ref()
and the “unref” method for the grammar class is
marpa_g_unref().

Time objects do not have explicit destructors.
When the reference count of a time object reaches
0, that time object is destroyed.

Much of the necessary reference counting
is performed automatically.
The context calling the constructor of a time object
does not need to explicitly increase the reference
count, because
Libmarpa time objects are
always created with a reference count of 1.

Child objects “own” their parents,
and when a child object is successfully created,
the reference count of its parent object is
automatically incremented to reflect this.
When a child object is destroyed, it
automatically decrements the reference count of its parent.

In a typical application, a calling context needs only
to remember
to “unref” each time object that it creates,
once it is finished with that time object.
All other reference decrements and increments are taken
care of automatically.
The typical application never needs to explicitly
call one of the “ref” methods.

More complex applications may find it convenient
to have one or more contexts share ownership of objects
created in another context.
These more complex situations
are the only cases in which the “ref” methods
will be needed.

3.4 Numbered objects

In addition to its major, “time” objects, Libmarpa also has
numbered objects.
Numbered objects do not have lifetimes of their own.
Every numbered object belongs to a time object,
and is destroyed with it.
Rules and symbols are numbered objects.
Tokens values are another class of numbered
objects.

4 Input

4.1 Earlemes

4.1.1 The traditional model

In traditional Earley parsers, the concept of location is very simple.
Locations are numbered from 0 to n, where n is the length of
the input.
Every location has an Earley set, and vice versa.
Location 0 is the start location.
Every location after the start location has exactly one input token
associated with it.

This document assumes that the reader knows the concepts
behind Libmarpa’s
alternative input models, either from the documentation
of a higher level interface, such as
Marpa::XS,
Marpa::R2,
or Marpa::R3,
or from Marpa’s
theory document.

As a reminder,
in Libmarpa a location is called a earleme.
The number of an Earley set is the ID of the Earley set,
or its ordinal.
In the traditional model, the ordinal of an Earley set and
its earleme are always exactly the same, but in Libmarpa
they will be different.

4.1.2 The earleme variables

The important earleme variables are the current earleme, the furthest earleme
and the latest earleme.
The current earleme is the earleme that Libmarpa is currently working on.
More specifically, it is the one at which new tokens will start.
Since tokens are never zero length, a new token will always end after the
current earleme.
The current earleme is initially earleme 0.
Every call to marpa_r_earleme_complete() advances the
current earleme by 1.

The furthest earleme is the highest numbered (and therefore “furthest”)
earleme at which a token ends.
The furthest earleme is initially earleme 0.
With every call to marpa_r_alternative(), the end of the token
it adds is calculated.
A token ends at the earleme location current+length,
where current is the current earleme,
and length is the length of the newly added token.
After a call to marpa_r_alternative(),
the furthest earleme is its value before the call,
or current+length,
whichever is greater.

The latest earleme is the earleme of the latest
Earley set.
The latest Earley set is the last Earley set completed.
This is always the highest numbered Earley set.
If there is an Earley set at the current earleme,
it is the latest Earley set and the latest earleme
is equal to the current earleme.
There is never an Earley set after the current earleme.

After every call to the marpa_r_earleme_complete() method
that adds a token,
the value of the latest earleme is
same as the value of the current earleme.
After every call to the marpa_r_earleme_complete() method
that does not add a token,
the value of the latest earleme is unchanged
from its value before the call.

4.1.3 The significances of the earleme variables

The current earleme tracks the advance of the recognizer through the input.
Input tokens always start at the current earleme.
An application can advance past the current earleme,
by calling marpa_r_earleme_complete(), which
increments the current earleme by 1.
After initialization,
marpa_r_earleme_complete() is
the only way to manipulate the value of the current earleme.

The furthest earleme tracks how “far out” tokens can be found.
In the standard input model, calling
marpa_r_earleme_complete() after each
marpa_r_alternative() call is sufficient to process
all inputs,
and the furthest earleme’s value
can be typically be ignored.
In alternative input models, if tokens have lengths greater than
1, calling
marpa_r_earleme_complete() once after the last token
is read may not be enough to ensure that all tokens have been processed.
To ensure that all tokens have been processed,
an application must advance the current earleme
by calling marpa_r_earleme_complete(),
until the current earleme is equal to the furthest earleme.

The lastest earleme is the earleme of the last Earley set.
The latest earleme is different from the current earleme if and only if
there is no Earley set at the current earleme.
A different end of parsing can be specified,
but by default, parsing is of the input
in the range
from earleme 0 to the latest earleme.

4.1.4 The initial earleme settings

All input models have the same initial values.
Initially the current, latest and furthest earleme
are always earleme 0.

Understanding the
settings of current, latest and furthest earleme is
crucial to working with advanced input models,
and for this reason the next sections will go
through the possibilities carefully.
The presentation will start with the most traditional
and restrictive models.
It will proceed to less restrictive models.

4.1.5 The standard model of input

In the standard model of input,
calls to marpa_r_alternative()
and marpa_r_earleme_complete() are
made in pairs.
There is, first, exactly one call
to marpa_r_alternative(),
and it is for a token with length 1.
Following it must be a call
to marpa_r_earleme_complete().
For an input of length n, there will be
exactly n such paired calls.

Suppose, in the standard model that,
for a call
to marpa_r_alternative(),
the following is true:

The current earleme before the call was c.

Because this is the standard model,
this means that we also know that

The latest earleme before the call was c.

The furthest earleme before the call was c.

In that case,
after the call to
marpa_r_alternative(),
the following will be true:

The current earleme will still be c.

The latest earleme will still be c.

The furthest earleme will be c+1.

Suppose, in the standard model that,
for a call
to marpa_r_earleme_complete(),
the following is true:

The current earleme before the call is c.

Because this is the standard model,
this means that we also know that

The latest earleme before the call was c.

The furthest earleme before the call was c+1.

In that case,
after the call
to marpa_r_earleme_complete(),
the current, latest and furthest earlemes will
be the same.
Specifically,
the following will be true:

The current earleme will be advanced to c+1.

The latest earleme will be advanced to c+1.

The furthest earleme will remain at c+1.

4.1.6 Ambiguous input

As a first loosening of the standard model,
we no longer require calls to marpa_r_alternative()
to be paired with calls to
marpa_r_earleme_complete().
Instead,
we allow a series of one or more calls
to marpa_r_alternative()
before each call to
marpa_r_earleme_complete().
We still require that there be at least one call
to marpa_r_alternative()
before each call to
marpa_r_earleme_complete(),
and we still require that all tokens have
a length of 1.

In the ambiguous input model, the behavior of the current,
latest and furthest earlemes are exactly
as described for the standard model, with one minor
exception:
Where the current earleme is c,
and a call to marpa_r_alternative() is not the first
of a series,
the value of the furthest earleme before the call will be
c+1.

4.1.7 Variable length tokens

Our next loosening of the restrictions is to allow
variable length tokens.
That is, instead of requiring that all tokens
be of length 1,
we allow tokens to be of length 1 or longer.
This does change the behavior of the earleme variables.

Suppose, in the variable length token model that,
for a call
to marpa_r_alternative(),
the following is true:

The current earleme before the call was c.
In this model, this means that
the latest earleme before the call is also c.

The furthest earleme before the call was old_f.

The length of the token is length.

In that case,
after the call to
marpa_r_alternative(),
the following will be true:

The current and latest earlemes will still be c.
The current and latest earlemes are never changed by a call
to marpa_r_alternative().

The furthest earleme will be either old_f or
c+length, whichever was greater.

Suppose, in the variable length token model that,
for a call
to marpa_r_earleme_complete(),
the following is true:

The current earleme before the call is c.
In this model, this means that
the latest earleme before the call is also c.

The furthest earleme before the call is old_f.

In that case,
after the call
to marpa_r_earleme_complete(),
the following will be true:

The current earleme will be advanced to c+1.

The latest earleme will also be c+1.

The furthest earleme will still be old_f.
The furthest earleme is never changed by a call
to marpa_r_earleme_complete().

4.1.8 The generalized model

To fully generalize the input model,
we now need to remove only one restriction.
We now allow empty earlemes — earlemes with
no tokens and no Earley set.
For the generalized input model,
the effect on the earleme variables of
a call
to marpa_r_alternative() is exactly
the same as it is for the variable length input model.
The effect on the earleme variables of a call to
to marpa_r_earleme_complete() depends on
whether or not that call creates an empty earleme.
A call
to marpa_r_earleme_complete()
creates an empty earleme if and only if
it falls into one of these two cases:

There has been no call
to marpa_r_alternative() since
recognizer initialization.

There has been no call
to marpa_r_alternative() since
the previous call
to marpa_r_earleme_complete().

Suppose, in the generalized input model that,
for a call
to marpa_r_earleme_complete()
that creates an empty earleme,
the following is true:

The current earleme before the call is c.

The latest earleme before the call is old_l.

The furthest earleme before the call is old_f.

In that case,
after the call
to marpa_r_earleme_complete(),
the following will be true:

The current earleme will be advanced to c+1.

The latest earleme will remain at old_l.
This means that the latest earleme will be less than
the current earleme.

The furthest earleme will remain at old_f.
The furthest earleme is never changed by a call
to marpa_r_earleme_complete().

Suppose, in the generalized input model that,
for a call
to marpa_r_earleme_complete()
that does not create an empty earleme,
the following is true:

The current earleme before the call is c.

The latest earleme before the call is old_l.

The furthest earleme before the call is old_f.

In that case,
after the call
to marpa_r_earleme_complete(),
the following will be true:

The current earleme will be advanced to c+1.

The latest earleme will be advanced to c+1.

The furthest earleme will remain at old_f.
The furthest earleme is never changed by a call
to marpa_r_earleme_complete().

4.1.9 General rules for the earleme variables

At this point, the most generalized input model has been
introduced.
Here are some useful facts about the earleme variables that will always be the case,
no matter which of the input models is in use.

The current earleme is always greater than
or equal to the latest earleme.

The furthest earleme is always greater than
or equal to the latest earleme.

If the furthest earleme is greater than the current earleme,
the parser is not exhausted.

If the furthest earleme is less than the current earleme,
the parser is exhausted.

If the parser is not exhausted,
the following will always be true,
no matter which of the input models is in use.

The furthest earleme is greater than
or equal to the current earleme.

In an exhausted parser, the following will always be true,
no matter which of the input models is in use.

The furthest earleme is less than
or equal to the current earleme.

4.2 Terminals

A terminal symbol is a symbol which
may appear in the input.
Traditionally,
all LHS symbols, as well as
the start symbol, must be non-terminals.
This is Marpa’s behavior, by default.

Marpa allows the user to eliminate the distinction
between terminals and non-terminals.
In this, it
differs from traditional parsers.
Libmarpa
can arrange for a terminal
to appear on the LHS of one or more rules,
or for a terminal to be the start symbol.
However,
since terminals can never be zero length,
it is a logical contradiction for a nulling
symbol to also be a terminal
and Marpa does not allow it.

Token values are int’s.
Libmarpa does nothing with token values except accept
them from the application and return them during
parse evaluation.

5 Semantics

Libmarpa handling of semantics is unusual.
Most semantics are left up to the application,
but Libmarpa guides them.
Specifically, the application is expected to maintain the evaluation
stack.
Libmarpa’s valuator provides instructions on how to handle the stack.
Libmarpa’s stack handling instructions
are called “steps”.
For example, a Libmarpa step might tell the application that the value
of a token needs to go into a certain stack position.
Or a Libmarpa step might tell the application that a rule is to be evaluated.
For rule evalution, Libmarpa will tell the application where the operands
are to be found,
and where the result must go.

The detailed discussion of
Libmarpa’s handling of semantics is in the reference chapters
of this document,
under the appropriate methods and classes.
The most extensive discussion of the semantics
is in the section that deals with the methods of the value time class.
See Value methods.

6 Threads

Libmarpa is thread-safe,
given circumstances as described below.
The Libmarpa methods are not reentrant.

Libmarpa is C89-compliant.
It uses no global data,
and calls only the routines
that are defined in the C89 standard
and that can be made thread-safe.
In most modern implementations,
the default C89 implementation is thread-safe
to the extent possible.
But the C89 standard does not require thread-safety,
and even most modern environments allow the user
to turn thread safety off.
To be thread-safe, Libmarpa must be compiled
and linked in an environment that provides
thread-safety.

While Libmarpa can be used safely across
multiple threads,
a Libmarpa grammar cannot be.
Further, a Libmarpa time object can
only be used safely in the same thread
as its base grammar.
This is because all
time objects with the same base grammar share data
from that base grammar.

To work around this limitation,
the same grammar definition can be
used to a create a new
Libmarpa grammar
time object in each thread.
If there is sufficient interest, future versions of
Libmarpa could allow thread-safe
cloning of grammars and other
time objects.

7 Fatal Errors

Libmarpa avoids fatal errors,
leaving that decision up to the application,
with one exception.
Currently, if malloc fails to allocate memory,
Libmarpa terminates the program with a fatal error.

While this is in keeping with current practice,
future versions of Libmarpa are likely to both allow
an alternative memory allocator to be specified,
and to allow the user to specifier a handler to
be called when an out-of-memory condition occurs.

8 Introduction to the external interface

The following chapters describe Libmarpa’s external
interface in detail.

8.1 About the overviews

The reference sections for each major class begin with an overview.
These sections name the
most important Libmarpa methods
in the order in which they are typically used,
and very briefly describe their purpose.
The overviews are intended as “cheat sheets”.

The overview sections often speak of
an “archetypal” application.
The archetypal Libmarpa application
implements a complete logic flow,
starting with the creation of a grammar,
and proceeding all the way
to the return of the final result from a value object.
In the archetypal Libmarpa application,
the grammar, input and semantics are
all small but non-trivial.

8.2 Return values

Return values are discussed method by method,
but some general practices are worth
mentioning.
For methods that return an integer,
Libmarpa usually reserves -1 for special purposes,
such as indicating loop termination in an iterator.
In Libmarpa, methods usually indicate failure
by returning -2.
Any result greater than -2 usually indicates success.

If a method returns an pointer value,
NULL usually indicates failure.
Any other result usually indicates success.

On failure, a Libmarpa method always sets
the error code.
On success, a Libmarpa method may take
one of three actions:

Set
an informational error code.

Leave
the error code as it is.

Reset the error code to MARPA_ERROR_NONE.

Which of the three actions the method
takes on success
should be regarded as unspecified,
unless stated in the description of the method
in this document.
If a method sets an informational error code on
success, that will always be stated in
its description, and the description will
specify,
for each informational error code,
the specific error code value,
the circumstances under which it is set,
and what return values will accompany it.

Informational error codes are set
because some applications may find them convenient.
Typically, informational error codes are redundant information,
which may be ignored.

A method can have many reasons for
failing, and many of the reasons
for failure are common to
large number of methods.
Full descriptions of the error codes
that are returned by the external methods
are given in their own section.
See External error codes.

8.3 Naming conventions

Methods in Libmarpa follow a strict naming convention.
All methods have a name beginning with marpa_,
if they are part of the
external interface.
If an external method is not a static method,
its name is prefixed with one of
marpa_c_,
marpa_g_,
marpa_r_,
marpa_b_,
marpa_o_,
marpa_t_ or
marpa_v_,
where the single letter between underscores
is one of the Libmarpa major class abbreviations.
The letter indicates which class
the method belongs to.

Methods that are exported,
but that are part of
the internal interface,
begin with _marpa_.
Methods that are part of the internal interface
(often called “internal methods”)
are subject to change and are intended for use
only by Libmarpa’s developers.

Libmarpa reserves the marpa_
and _marpa_ prefixes for itself,
with all their capitalization variants.
All Libmarpa names visible outside the package
will begin with a capitalization variant
of one of these two prefixes.

9 Static methods

Checks that the Marpa library in use is compatible with the
given version. Generally you will pass in the constants
MARPA_MAJOR_VERSION,
MARPA_MINOR_VERSION,
MARPA_MICRO_VERSION
as the three arguments to this function; that produces
a check that the library in use is compatible with
the version of Libmarpa the application or module was compiled
against.
Compatibility is defined by two things:
first the version
of the running library is newer than the version
required_major.required_minor.required_micro.
Second
the running library must be binary compatible with the
version
required_major.required_minor.required_micro
(same major version.)

Return value: MARPA_ERR_NONE if the Marpa library is compatible with the
requested version. If the library is not compatible,
one of the following is returned, indicating the nature of the mismatch:

MARPA_ERR_MAJOR_VERSION_MISMATCH,

MARPA_ERR_MINOR_VERSION_MISMATCH

MARPA_ERR_MICRO_VERSION_MISMATCH

Function: Marpa_Error_Codemarpa_version(int* version)

Returns the version number in version,
which must have room for three int’s.

Return value: A non-negative number on success,
-2 on failure.

10 Configuration methods

The configuration object is intended for future extensions.
These may
allow the application to override Libmarpa’s memory allocation
and fatal error handling without resorting to global
variables, and therefore in a thread-safe way.
Currently, the only function of the Marpa_Config
class is to give marpa_g_new()
a place to put its error code.

Marpa_Config is Libmarpa’s only “major”
class which is not a time class.
There is no constructor or destructor, although
Marpa_Config objects do need to be initialized
before use.
Aside from its own accessor,
Marpa_Config objects are only used by marpa_g_new
and no reference to their location is not kept
in any of Libmarpa’s time objects.
The intent is to that it be convenient
to have them in memory that might be deallocated
soon after marpa_g_new returns.
For example, they could be put on the stack.

Function: intmarpa_c_init( Marpa_Config* config)

Initialize the config information to “safe” default
values.
Unspecified behavior will result
if an initialized
configuration is used to create a grammar.

Error codes are usually kept in the base grammar,
which leaves marpa_g_new() no place to put
its error code on failure.
Objects of
the Marpa_Config class provide such a place.
p_error_string is reserved for use by
the internals.
Applications should set it to NULL.

Return value:
The error code in config.
Always succeeds.

11 Grammar methods

11.1 Overview

An archetypal application has a grammar.
To create a grammar, use the marpa_g_new() method.
When a grammar is no longer in use, its memory can be freed
using the
marpa_g_unref() method.

To be precomputed,
a grammar must have one or more symbols.
To create symbols, use the
marpa_g_symbol_new() method.

To be precomputed,
a grammar must have one or more rules.
To create rules, use the
marpa_g_rule_new() and
marpa_g_sequence_new() methods.

For non-trivial parsing,
one or more of the symbols must be terminals.
To mark a symbol as a terminal,
use the
marpa_g_symbol_is_terminal_set() method.

To be precomputed,
a grammar must have exactly one start symbol.
To mark a symbol as the start symbol,
use the
marpa_g_start_symbol_set() method.

Before parsing with a grammar, it must be precomputed.
To precompute a grammar,
use the
marpa_g_precompute() method.

11.2 Creating a new grammar

Function: Marpa_Grammarmarpa_g_new( Marpa_Config* configuration )

Creates a new grammar time object.
The returned grammar object is not yet precomputed,
and will have no symbols and rules.
Its reference count will be 1.

Unless the application calls marpa_c_error,
Libmarpa will not reference the location
pointed to by the configuration
argument after marpa_g_new returns.
The configuration argument may be NULL,
but if it is,
there will be no way to determine
the error code on failure.

Return value: On success, the grammar object.
On failure, NULL,
and the error code is set in configuration.

Function: intmarpa_g_force_valued( Marpa_Grammar g )

It is recommended that
this call be made immediately after the
grammar constructor.
It turns off a deprecated feature.

The marpa_g_force_valued forces all the
symbols in a grammar to be “valued”.
The opposite of a valued symbol is one about whose value
you do not care.
This distinction has been made in the past in hope
of gaining efficiencies at evaluation time.
Current thinking is that the gains do not repay the extra
complexity.

Return value: On success, a non-negative integer.
On failure, a negative integer.

11.3 Tracking the reference count of the grammar

Function: Marpa_Grammarmarpa_g_ref(Marpa_Grammar g)

Increases the reference count by 1.
Not needed by most applications.

Return value:
On success, the grammar object it was called with;
NULL on failure.

This return value of this call indicates whether sym_id
is the start symbol.

Success value:
Returns 1 if sym_id is the start symbol.
Returns 0 if sym_id is not the start symbol,
either because the start symbol is different from
sym_id,
or because the start symbol has not been set yet.

Failure values:
If sym_id is well-formed, but there is no
such symbol, -1.
On other failure, -2.

These methods, respectively, set
and query the “terminal status” of a symbol.
To be used as an input symbol
in the marpa_r_alternative() method,
a symbol must be a terminal.
This function flags symbol sym_id as a terminal if
value is 1,
or flags it as a non-terminal if value is 0.

Once set to a value with the
marpa_g_symbol_is_terminal_set() method,
the terminal status of a symbol is “locked” at that value.
A subsequent call to
marpa_g_symbol_is_terminal_set() that attempts
to change the terminal status
of sym_id to a value different from its current one
will fail.
The error code will be MARPA_ERR_TERMINAL_IS_LOCKED.

By default, a symbol is a terminal if and only if it
does not appear on the LHS of any rule.
An attempt to flag a nulling symbol
as a terminal will cause a failure,
but this is not necessarily detected before precomputation.

Success returns:
When successful,
these methods return
1 if symbol sym_id
is a terminal symbol after the
call, 0 otherwise.

Failure returns:
If sym_id is well-formed, but there is no
such symbol, -1.
On all other failures, -2.
Other failures include
when value is not 0 or 1;
when the terminal status is locked and value
is different from its current value;
and,
in the case of marpa_g_symbol_is_terminal_set(),
when the grammar g is precomputed.

Function: Marpa_Symbol_IDmarpa_g_symbol_new(Marpa_Grammar g)

Creates a new symbol.
The symbol ID will be a non-negative integer.

Return value: On success, the ID of a new symbol;
On failure, -2.

11.5 Rules

Function: intmarpa_g_highest_rule_id(Marpa_Grammar g)

Return value: On success, the numerically largest rule ID
of g.
On failure, -2.

A rule is a loop rule if it non-trivially
produces the string of length one
which consists only of its LHS symbol.
Such a derivation takes the parse back to where
it started, hence the term “loop”.
“Non-trivially” means the zero-step derivation does not count — the
derivation must have at least one step.

The presence of a loop rule makes a grammar infinitely ambiguous,
and applications will typically want to treat them as fatal errors.
But nothing forces an application to do this,
and Marpa will successfully parse and evaluate grammars with
loop rules.

Return value: On success,
1 if rule rule_id is a loop rule, 0 if not.
If rule_id is well-formed, but there is no
such rule, -1.
If the grammar
is not precomputed, or on other failure, -2.

A rule is productive if it can produce a string of terminals.
A rule is productive if and only if all the symbols on
its RHS are productive.
The empty string counts as a string of terminals,
so that a nullable rule is always a productive rule.
For that same reason,
an empty rule is considered productive.

Return value: On success,
1 if rule rule_id is productive, 0 if not.
If rule_id is well-formed, but there is no
such rule, -1.
If the grammar is not precomputed, or on other failure, -2.

Creates a new external BNF rule in grammar g.
The ID of the new rule will be a non-negative integer,
which will be unique to that rule.
In addition to BNF rules, Marpa also allows sequence rules,
which are created by another method:
marpa_g_sequence_new().
Sequence rules and BNF rules are numbered in the same series,
so that a BNF rule will never have the same rule ID as a sequence
rule, and vice versa.

The LHS symbol is lhs_id,
and there are length symbols on the RHS.
The RHS symbols are in an array
pointed to by rhs_ids.

Possible failures, with their error codes, include:

MARPA_ERR_SEQUENCE_LHS_NOT_UNIQUE: The LHS symbol is the same
as that of a sequence rule.

MARPA_ERR_DUPLICATE_RULE: The new rule would duplicate another BNF
rule.
Another BNF rule is considered the duplicate of the new one,
if its LHS symbol is the same as symbol lhs_id,
if its length is the same as length,
and if its RHS symbols match one for one those
in the array of symbols rhs_ids.

Return value: On success, the ID of new external rule.
On failure, -2.

Returns the ID of the symbol in position ix
in the RHS of rule rule_id.
The RHS position, ix, is zero-based.

Return value: On success,
the ID of the symbol in position ix
on the rules RHS.
If rule_id is well-formed, but there is no
such rule, -1.
If ix is greater than or equal to the length of
the rule,
or on other failure, -2.

11.6 Sequences

Success returns:
marpa_g_rule_is_proper_separation()
succeeds if and only if rule_id is valid.
If rule rule_id
is a sequence rule where the proper separation flag
is set, returns 1.
On other success,
including the case
where rule rule_id
is not a sequence rule, returns 0.

The marpa_g_rule_is_proper_separation()
does not distinguish rules without proper
separation from rules for which separation
is not defined, because they are not
sequence rules.
Applications for which this distinction is
important should use
the
marpa_g_sequence_min() method.

Failure returns:
If rule_id is well-formed, but there is no
such rule, returns -1.
On other failure, -2.

Returns the mininum length of a sequence rule.
This accessor can be also be used to test whether
or not a rule is a sequence rule.
-1 is returned if and only if the rule is valid
but not a sequence rule.

Return value:
If rule rule_id is a sequence rule, its minimum length.
If rule rule_id is valid, but is
not the rule ID of sequence rule, -1.
If rule_id is well-formed, but there is no
such rule,
or on other failure, -2.

Adds a new sequence rule to grammar g.
The ID of the new sequence rule will be a non-negative integer,
which is unique to that rule.
Sequence rules and BNF rules are numbered in the same series,
so that a BNF rule will never have the same rule ID as a sequence
rule, and vice versa.

The sequence is lhs_id,
and the item to be repeated in the sequence is rhs_id.
The sequence must be repeated at least min times,
where min is 0 or 1.
If separator_id is non-negative,
it is a separator symbol.

If flags & MARPA_PROPER_SEPARATION is non-zero,
separation is “proper”, that is,
a trailing separator is not allowed.
The term proper is based on the idea that
properly-speaking, separators should actually separate items.

Some higher-level Marpa interfaces offer the ability to
discard separators in the semantics,
and in fact will do this by default.
At the Libmarpa level, sequences always “keep
separators”.
It is up to the programmer to arrange
to discard separators,
if that is what is desired.

The sequence RHS, or item,
is restricted to a single symbol,
and that symbol cannot be nullable.
If separator_id is a symbol, it also cannot
be a nullable symbol.
Nullables on the RHS of sequences are restricted
because they lead to highly ambiguous grammars.
Grammars of this kind are allowed by Libmarpa, but
they must be expressed using BNF rules, not sequence rules.
This is for two reasons:
First, sequence optimizations would not work
in the presence of nullables.
Second, since it is not completely clear what
an application intends
when it asks for a sequence of identical items,
some of which are nullable,
the user’s intent can be more clearly expressed
directly in BNF.

The LHS symbol cannot be the LHS of any other rule,
whether a BNF rule or a sequence rule.
On an attempt to create an sequence rule with a duplicate
LHS,
marpa_g_sequence_new() fails,
setting the error code to
MARPA_ERR_SEQUENCE_LHS_NOT_UNIQUE.

Sequence rules do not add to the classes of grammar parsed
by Libmarpa —
a sequence can always be written as BNF rules.
When a rule is created
with the marpa_g_sequence_new() method,
Libmarpa will understand that it is a sequence,
and will optimize accordingly.
The speedup is often considerable.

Return value: On success, the ID of the external rule.
On failure, -2.

Return value:
If rule rule_id is a sequence rule, its separator.
If rule rule_id is a sequence rule, but there is no
separator, -1.
If rule_id is
not a sequence rule,
does not exist
or is not well-formed;
or on other failure, -2.

11.7 Ranks

These methods, respectively, set
and query the rank of a rule rule_id.
When rule_id is created, its rank
initialized to the default rank of the grammar.

Initially, the default rank of the grammar is 0.
Methods to reset the grammar’s default rank
are currently in the experimental stage.

Return value: On success, returns
the rank after
the call,
and sets the error code to
MARPA_ERR_NONE.
On failure, returns -2,
and sets the error code to an appropriate
value, which will never be
MARPA_ERR_NONE.
Note that when the rank is -2,
the error code is the only way to distinguish
success from failure.
The error code can be determined by using the
marpa_g_error() call.

These methods, respectively, set
and query the “null ranks high” setting of the rule rule_id.
The “null ranks high” setting is either 0 or 1.
When rule_id is created, its “null ranks high”
setting is initialized to 0.

The “null ranks high” setting affects the ranking of rules
with properly nullable symbols on their right hand side.
If a rule has properly nullable symbols on its RHS,
each instance in which it appears in a parse will have a pattern
of nulled and non-nulled symbols.
Such a pattern is called a “null variant”.

If the “null ranks high” setting is 0 (the default),
nulled symbols rank low.
If the “null ranks high” setting is 1,
nulled symbols rank high.
Ranking of a null variants is done from left-to-right.

The marpa_g_rule_null_high_set() method will
return failure
after the grammar has been precomputed.
If there is no other cause of failure,
the marpa_g_rule_null_high() method succeeds
on both precomputed and unprecomputed grammars.

Success return value:
The value
of the “null ranks high” flag after
the call.

Failure return value:
If rule_id is well-formed, but there is no such rule, -1.
On all other failures, -2.

11.8 Events

Allows the user to deactivate and reactivate symbol completion
events in the grammar.
When a recognizer is created,
the activation status of each of its events is initialized
to the activation status of that event in the base grammar.
If reactivate is zero, the event is deactivated
in the grammar.
If reactivate is one, the event is activated
in the grammar.

Symbol completion events are active by default if
the symbol was set up for completion events in the
grammar.
If a symbol was not set up for completion events in
the grammar, symbol completion events are inactive
by default and any attempt to change that is a fatal error.

The activation status of a completion event in the grammar
can only be changed if the symbol is marked as a completion event symbol
in the grammar,
and before
the grammar is precomputed.
However, if a symbol is marked as a completion event symbol in
the recognizer,
the completion event can be deactivated
and reactivated in the recognizer.

Success cases:
On success,
the method returns the value of reactivate.
The method succeeds trivially
if the symbol is already set as indicated by
reactivate.

Failure cases:
If the active status
of the
completion event for sym_id
cannot be set as indicated by reactivate,
the method fails.
On failure, -2 is returned.

Allows the user to deactivate and reactivate symbol nulled
events in the grammar.
When a recognizer is created,
the activation status of each of its events is initialized
to the activation status of that event in the base grammar.
If reactivate is zero, the event is deactivated
in the grammar.
If reactivate is one, the event is activated
in the grammar.

Symbol nulled events are active by default if
the symbol was set up for nulled events in the
grammar.
If a symbol was not set up for nulled events in
the grammar, symbol nulled events are inactive
by default and any attempt to change that is a fatal error.

The activation status of a nulled event in the grammar
can only be changed if the symbol is marked as a nulled event symbol
in the grammar,
and before
the grammar is precomputed.
However, if a symbol is marked as a nulled event symbol in
the recognizer,
the nulled event can be deactivated
and reactivated in the recognizer.

Success cases:
On success,
the method returns the value of reactivate.
The method succeeds trivially
if the symbol is already set as indicated by
reactivate.

Failure cases:
If the active status
of the
nulled event for sym_id
cannot be set as indicated by reactivate,
the method fails.
On failure, -2 is returned.

Allows the user to deactivate and reactivate symbol prediction
events in the grammar.
When a recognizer is created,
the activation status of each of its events is initialized
to the activation status of that event in the base grammar.
If reactivate is zero, the event is deactivated
in the grammar.
If reactivate is one, the event is activated
in the grammar.

Symbol prediction events are active by default if
the symbol was set up for prediction events in the
grammar.
If a symbol was not set up for prediction events in
the grammar, symbol prediction events are inactive
by default and any attempt to change that is a fatal error.

The activation status of a prediction event in the grammar
can only be changed if the symbol is marked as a prediction event symbol
in the grammar,
and before
the grammar is precomputed.
However, if a symbol is marked as a prediction event symbol in
the recognizer,
the prediction event can be deactivated
and reactivated in the recognizer.

Success cases:
On success,
the method returns the value of reactivate.
The method succeeds trivially
if the symbol is already set as indicated by
reactivate.

Failure cases:
If the active status
of the
prediction event for sym_id
cannot be set as indicated by reactivate,
the method fails.
On failure, -2 is returned.

Libmarpa can be set up to generate an
MARPA_EVENT_SYMBOL_COMPLETED
event whenever the symbol is completed.
A symbol is said to be completed
when a non-nulling rule with
that symbol on its LHS is completed.

For completion events to occur, the symbol must be marked
as a completion event symbol.
The marpa_g_symbol_is_completion_event_set() function
marks symbol sym_id as a completion event symbol
if value is 1,
and unmarks it
it as a completion event symbol if value is 0.
The marpa_g_symbol_is_completion_event() method
returns the current value of the completion event marking
for symbol sym_id.

Marking a completion event sets its activation status to on.
Unmarking a completion event sets its activation status to off.
The completion event marking cannot be changed once
the grammar is precomputed.

If a completion event is marked,
its activation status can be changed using the
marpa_g_completion_symbol_activate() method.
Note that, if a symbol is marked as a completion event symbol
in the recognizer,
its completion event can be deactivated
and reactivated in the recognizer.

Nulled rules and symbols will never cause completion events.
Nullable symbols
may be marked as completion event symbols,
but this will have an effect only when the symbol
is not nulled.
Nulling symbols
may be marked as completion event symbols,
but no completion events will ever be generated
for a nulling symbol.
Note that this implies at no completion event will
ever be generated at earleme 0,
the start of parsing.

Success: On success, 1 if symbol sym_id
is a completion event symbol after the
call, 0 otherwise.

Failures:
If sym_id is well-formed, but there is no
such symbol, -1.
If the grammar g is precomputed;
or on other failure, -2.

Libmarpa can set up to generate
an MARPA_EVENT_SYMBOL_NULLED
event whenever the symbol is nulled.
A symbol is said to be nulled
when a zero length instance of that symbol
is recognized.

For nulled events to occur, the symbol must be marked
as a nulled event symbol.
The marpa_g_symbol_is_nulled_event_set() function
marks symbol sym_id as a nulled event symbol
if value is 1,
and unmarks it
it as a nulled event symbol if value is 0.
The marpa_g_symbol_is_nulled_event() method
returns the current value of the nulled event marking
for symbol sym_id.

Marking a nulled event sets its activation status to on.
Unmarking a nulled event sets its activation status to off.
The nulled event marking cannot be changed once
the grammar is precomputed.

If a nulled event is marked,
its activation status can be changed using the
marpa_g_nulled_symbol_activate() method.
Note that, if a symbol is marked as a nulled event symbol
in the recognizer,
its nulled event can be deactivated
and reactivated in the recognizer.

As a reminder, a symbol instance is a symbol
at a specific location in the input,
and with a specific length.
Also, whenever a nulled symbol instance is
recognized at a location, it is acceptable
at that location,
and vice versa.

When a symbol instance is recognized at a location,
it will generate a nulled event or a prediction event,
but never both.
A symbol instance of zero length,
when recognized at a location,
generates a nulled event at that location,
and does not generate a completion event.
A symbol instance of non-zero length,
when acceptable at a location,
generates a completion event at that location,
and does not generate a nulled event.

When a symbol instance is acceptable at a location,
it will generate a nulled event or a prediction event,
but never both.
A symbol instance of zero length,
when acceptable at a location,
generates a nulled event at that location,
and does not generate a prediction event.
A symbol instance of non-zero length,
when acceptable at a location,
generates a prediction event at that location,
and does not generate a nulled event.

While it is not possible for
a symbol instance
to generate both a nulled event and
a completion event at a location,
it is quite possible that
a symbol might generate
both kinds of event at that location.
This is because multiple instances
of the same symbol may be recognized at
a given location,
and these instances will have
different lengths.
If one instance is recognized at a given location
as zero length and
a second, non-zero-length,
instance is recognized at the same location,
the first will generate only nulled events,
while the second will generate only completion events.
For similar reasons,
while a symbol instance will never
generate both a null event and a prediction
event at a location,
multiple instances of the same symbol
may do so.

Zero length derivations can be ambiguous.
When a zero length symbol is recognized,
all of its zero-length derivations are also considered to be
recognized.

The marpa_g_symbol_is_nulled_event_set() method will
mark a symbol as a nulled event symbol,
even if the symbol is non-nullable.
This is convenient, for example,
for automatically generated grammars.
Applications which wish to treat
it as a failure if there is an
attempt to
mark a non-nullable symbol
as a nulled event symbol,
can check for this case using
the marpa_g_symbol_is_nullable()
method.

Success: On success, 1 if symbol sym_id
is a nulled event symbol after the
call, 0 otherwise.

Failures:
If sym_id is well-formed, but there is no
such symbol, -1.
If the grammar g is precomputed;
or on other failure, -2.

Libmarpa can be set up
to generate a
MARPA_EVENT_SYMBOL_PREDICTED
event when a non-nulled symbol is predicted.
A non-nulled symbol is said to be predicted
when a instance of it
is acceptable at the current
earleme according to the grammar.
Nulled symbols do not generate predictions.

For predicted events to occur, the symbol must be marked
as a predicted event symbol.
The marpa_g_symbol_is_predicted_event_set() function
marks symbol sym_id as a predicted event symbol
if value is 1,
and unmarks it
it as a predicted event symbol if value is 0.
The marpa_g_symbol_is_predicted_event() method
returns the current value of the predicted event marking
for symbol sym_id.

Marking a prediction event sets its activation status to on.
Unmarking a prediction event sets its activation status to off.
The prediction event marking cannot be changed once
the grammar is precomputed.

If a prediction event is marked,
its activation status can be changed using the
marpa_g_prediction_symbol_activate() method.
Note that, if a symbol is marked as a prediction event symbol
in the recognizer,
its prediction event can be deactivated
and reactivated in the recognizer.

Success: On success, 1 if symbol sym_id
is a predicted event symbol after the
call, 0 otherwise.

Failures:
If sym_id is well-formed, but there is no
such symbol, -1.
If the grammar g is precomputed;
or on other failure, -2.

11.9 Precomputing the Grammar

Function: intmarpa_g_precompute(Marpa_Grammar g)

Precomputation is necessary for a recognizer to be generated
from a grammar.
On success, marpa_g_precompute returns a non-negative
number to indicate that it precomputed the grammar without
issues.
On failure, marpa_g_precompute returns -2.

Precomputation may return one or more events,
which may be queried using the
marpa_g_event() method.
At this point events only occur when failure is reported,
and events always report issues.
But application writers should expect future versions
to have events which are reported on success,
as well as events which do not represent issues.

A MARPA_EVENT_LOOP_RULES event occurs
when there are infinite loop rules (cycles)
in the grammar.
The presence of one or more of these will cause failure
to be reported,
but will not prevent the grammar from being precomputed.

Each MARPA_EVENT_COUNTED_NULLABLE event is a symbol
which is a nullable on the right hand side of a sequence
rule — a “counted” symbol.
The presence of one or more of these will cause failure
to be reported,
and will prevent the grammar from being precomputed.
So that the programmer can fix several at once,
these failures are delayed until events are created
for all of the counted nullables.

Each MARPA_EVENT_NULLING_TERMINAL event is a nulling
symbol which is also flagged as a terminal.
Since terminals cannot be of zero length, this is a logical
impossibility.
The presence of one or more of these will cause failure
to be reported,
and will prevent the grammar from being precomputed.
So that the programmer can fix several at once,
the failure is delayed until events are created
for all of the counted nullables.

Precomputation involves freezing
and then thoroughly checking the grammar.
Among the reasons for precomputation to fail
are the following:

MARPA_ERR_NO_RULES: The grammar has no rules.

MARPA_ERR_NO_START_SYMBOL: No start symbol was specified.

MARPA_ERR_INVALID_START_SYMBOL: A start symbol ID was specified, but it
is not the ID of a valid symbol.

MARPA_ERR_START_NOT_LHS: The start symbol is not on the LHS of any rule.

MARPA_ERR_UNPRODUCTIVE_START: The start symbol is not productive.

MARPA_ERR_COUNTED_NULLABLE: A symbol on the RHS of a sequence rule is
nullable.
Libmarpa does not allow this.

MARPA_ERR_NULLING_TERMINAL: A terminal is also a nulling symbol.
Libmarpa does not allow this.

More details of these can be found under the
description of the appropriate code.
See External error codes.

marpa_g_precompute() is unusual in that it
is possible to treat one of its failures as “advisory”,
and to proceed with parsing.
If
marpa_g_precompute() fails with an error code
of MARPA_ERR_GRAMMAR_HAS_CYCLE,
parsing can proceed, just as it typically would for
success.
The grammar will have been precomputed, as
calling the marpa_g_is_precomputed() method
will confirm.

Most applications, however,
will want to simply treat failure
with MARPA_ERR_GRAMMAR_HAS_CYCLE,
as simply another failure,
and fix the cycles before parsing.
Cycles make a grammar infinitely ambiguous,
and are considered useless in current
practice.
Cycles make processing the grammar less
efficient, sometimes considerably so.
Detection of cycles is returned as failure
because that is by far the convenient thing to do
for the vast majority of applications.

This function allows the application to determine if grammar
g has a cycle.
As mentioned, most applications will want to treat these
as fatal errors.
To determine which rules are in the cycle,
marpa_g_rule_is_loop() can be used.

Return value: On success, 1 if the grammar has a cycle,
0 otherwise.
On failure, -2.

12 Recognizer methods

12.1 Overview

An archetypal application uses a recognizer to read input.
To create a recognizer, use the marpa_r_new() method.
When a recognizer is no longer in use, its memory can be freed
using the
marpa_r_unref() method.

To make a recognizer ready for input,
use the marpa_r_start_input() method.

The recognizer starts with its current earleme
at location 0.
To read a token at the current earleme,
use the marpa_r_alternative() call.

To complete the processing of the current earleme,
and move forward to a new one,
use the marpa_r_earleme_complete() call.

12.2 Creating a new recognizer

Function: Marpa_Recognizermarpa_r_new( Marpa_Grammar g )

Creates a new recognizer.
The reference count of the recognizer will be 1.
The reference count of g,
the base grammar,
will be incremented by one.

Return value: On success, the newly created recognizer.
If g is not precomputed, or on other failure, NULL.

12.3 Keeping the reference count of a recognizer

Function: Marpa_Recognizermarpa_r_ref(Marpa_Recognizer r)

Increases the reference count by 1.
Not needed by most applications.

Return value:
On success, the recognizer object, r.
On failure, NULL.

Function: voidmarpa_r_unref(Marpa_Recognizer r)

Decreases the reference count by 1,
destroying r once the reference count reaches
zero.
When r is destroyed, the reference count
of its base grammar is decreased by one.
If this takes the reference count of the base grammar
to zero, it too is destroyed.

12.4 Life cycle mutators

Function: intmarpa_r_start_input(Marpa_Recognizer r)

Makes r ready to accept input.
The first Earley set,
the one at earleme 0,
will be completed during this call.

Because the call to marpa_r_start_input()
completes an Earley set,
it may generate events.
For details about
the events that may be generated during
Earley set completion,
see the description of the
marpa_r_earleme_complete() method.

Reads a token into r.
The token will start at the current earleme.
Libmarpa allows tokens to be ambiguous, to be of
variable length and to overlap.
token_id is the symbol ID of the token,
which must be a terminal.
length is the length of the token.

value is an
integer that represents the value of the
token.
In applications where the token’s actual value is not an integer, it is
expected that the application will use this value as
a “virtual” value,
perhaps finding the actual value by using value
to index an array.
value is not used inside Libmarpa — it is simply
stored to be returned by the valuator
as a convenience for the application.
Some applications may prefer to track token values on
their own, perhaps based on
the earleme location and token_id,
instead of using Libmarpa’s token values.

A value of 0 is reserved for a now-deprecated feature.
Do not use it.
For more details on that feature, see the
section Valued and unvalued symbols.

When marpa_r_alternative()
is successful,
the value of the furthest earleme is set to
the greater of its value before the call,
and current+length,
where current is the value of the current earleme.
The values of the current and latest earlemes
are unchanged by
calls to marpa_r_alternative().

Several error codes leave the recognizer in a fully
recoverable state, allowing the application to
retry the marpa_r_alternative() method.
Retry is efficient, and quite useable as a parsing
technique.
The error code
of primary interest from this point of view
is MARPA_ERR_UNEXPECTED_TOKEN_ID,
which indicates that the token was not accepted
because of its token ID.
Retry after this condition is used in several
applications,
and is called “the Ruby Slippers technique”.

The error codes
MARPA_ERR_DUPLICATE_TOKEN,
MARPA_ERR_NO_TOKEN_EXPECTED_HERE
and MARPA_ERR_INACCESSIBLE_TOKEN
also leave the recognizer in a fully recoverable
state, and may also be useable for the
Ruby Slippers or similar techniques.
At this writing,
the author knows of no applications which
attempt to recover from these errors.

Return value: On success, MARPA_ERR_NONE.
On failure, some other error code.

Function: intmarpa_r_earleme_complete(Marpa_Recognizer r)

This method does the final processing for the current earleme.
It then advances the current earleme by one.
Note that marpa_r_earleme_complete() may be called
even when no tokens have been read at the current earleme —
in the character-per-earleme input model, for example, tokens
can span many characters and, if the input is unambiguous over that
span, there will be no other tokens that start inside it.

As mentioned,
marpa_r_earleme_complete() always advances the current earleme,
incrementing its value by 1.
This means that value of the current earleme after the call
will be the one plus the value of the earleme processed by the call
to marpa_r_earleme_complete().
If any token was accepted at the earleme being processed,
marpa_r_earleme_complete() creates a new Earley set
which will be the latest Earley set
and, after the call, the latest
earleme will be equal to the new current earleme.
If no token was accepted at the
earleme being processed,
no Earley set is created,
and the value of the latest earleme remains unchanged.
The value of the furthest earleme is never changed by
a call to marpa_r_earleme_complete().

During this method, one or more events may occur.
On success, this function returns the number of events
generated,
but it is important to note that events may be
created whether earleme completion fails or succeeds.
When this method fails,
the application must call marpa_g_event()
if it wants to determine if any events occurred.
Since the reason for failure to complete an earleme is often
detailed in the events, applications that fail will often
be at least as interested in the events as those
that succeed.

The MARPA_EVENT_EARLEY_ITEM_THRESHOLD event
indicates that an application-settable threshold
on the number of Earley items has been reached or exceeded.
What this means depends on the application,
but when the default threshold is exceeded,
it means that it is very likely
that the time and space resources consumed by
the parse will prove excessive.

A parse is “exhausted” when it can accept no more
input.
This can happen both on success and on failure.
Note that the failure due to parse exhaustion
only means failure
at the current earleme.
There may be successful parses
at earlier earlemes.

If a parse is exhausted, but successful,
an event with the event code
MARPA_EVENT_EXHAUSTED occurs.
Because the parse is exhausted,
no input will be accepted at later earlemes.
It is quite common for a parse to become exhausted
when it succeeds.
Many practical grammars are designed so that a
successful parse cannot be extended.

An exhausted parse may cause a failure,
in which case
marpa_r_earleme_complete() returns an
error
whose error code is MARPA_ERR_PARSE_EXHAUSTED.
For a parse to fail at an earleme due to exhaustion,
it must be the case that no alternatives were
accepted at that earleme.
In fact,
in the standard input model,
a failure due to parse exhaustion
occurs if and only if
no alternatives
were accepted at the current earleme.

The circumstances under which
failure due to parse exhaustion occurs
are slightly more complicated
when variable length tokens are in use.
Informally,
a parse will never fail due to exhaustion as long
as it is possible that a token
ending at some future
earleme will continue the parse.
More precisely,
a call to
marpa_r_earleme_complete() fails
due to parse exhaustion
if and only if, first,
no alternatives were added at the current earleme
and, second,
that call left the current
earleme equal to the furthest earleme.

Return value: On success, the number of events generated.
On failure, -2.

12.5 Location accessors

Function: Marpa_Earlememarpa_r_current_earleme(Marpa_Recognizer r)

Return value: If input has started, the current earleme.
If input has not started, -1.
Always succeeds.

In the default, token-stream model, Earley set ID and earleme
are always equal, but this is not the case in other input
models.
(The ID of an Earley set ID is also called its ordinal.)
If there is no Earley set whose ID is
set_id,
marpa_r_earleme() fails.
If set_id was negative,
the error code is set to
MARPA_ERR_INVALID_LOCATION.
If set_id is greater than the ordinal
of the latest Earley set,
the error code is set to
MARPA_ERR_NO_EARLEY_SET_AT_LOCATION.

At this writing, there is no method for
the inverse operation (conversion of an earleme to an Earley set
ID).
One consideration in writing
such a method is that not all earlemes correspond to Earley sets.
Applications that want to map earlemes
to Earley sets will have no trouble if they
are using the standard input model —
the Earley set
ID is always exactly equal to the earleme in that model.
For other applications
that want an earleme-to-ID mapping,
the most general method is create an ID-to-earleme
array using the marpa_r_earleme() method
and invert it.

Return value:
On success,
the earleme corresponding to Earley
set set_id.
On failure, -2.

If p_value is non-zero,
sets the location pointed to by
p_value to the integer value of the Earley set.
Similarly, if p_pvalue is non-zero,
sets the location pointed to by
p_pvalue to the pointer value of the Earley set.

The “value” and “pointer” of an Earley set are
an arbitrary integer
and an arbitrary pointer
that the application
can use for its own purposes.
In character-per-earleme input models, for example,
the integer can be the codepoint of the current character.
In a traditional token-per-earleme input model,
they could be used to indicate the string value of the token –
the pointer could point to the start of the string,
and the integer could indicate its length.

The Earley set value and pointer can be set using
the
marpa_r_latest_earley_set_values_set() method.
The Earley set integer value defaults to -1,
and the pointer value defaults to NULL.

12.6 Other parse status methods

Allows the user to deactivate and reactivate symbol completion
events in the recognizer.
If reactivate is zero, the event is deactivated.
If reactivate is one, the event is activated.

Symbol completion events are active by default if
the symbol was set up for completion events in the
grammar.
If a symbol was not set up for completion events in
the grammar, symbol completion events are inactive
by default and any attempt to change that is a fatal error.

Success cases:
On success,
the method returns the value of reactivate.
The method succeeds trivially
if the symbol is already set as indicated by
reactivate.

Failure cases:
If the active status
of the
completion event for sym_id
cannot be set as indicated by reactivate,
the method fails.
On failure, -2 is returned.

These methods, respectively, set and query
the Earley item warning threshold.
The Earley item warning threshold
is a number that is compared with
the count of Earley items in each Earley set.
When it is matched or exceeded,
a MARPA_EVENT_EARLEY_ITEM_THRESHOLD event is created.

If threshold is zero or less,
an unlimited number of Earley items
will be allowed without warning.
This will rarely be what the user wants.

By default, Libmarpa calculates a value based on the grammar.
The formula Libmarpa uses is the result of some experience,
and most applications will
be happy with it.

Return value:
The value that the Earley item warning threshold has
after the method call is finished.
Always succeeds.

Sets the “expected symbol event bit” for symbol_id to value.
A recognizer event is created whenever
symbol symbol_id is expected at the current earleme.
if and only if the expected symbol event bit
for symbol_id is 1.
The “expected symbol event bit” must be 1 or 0.

In this context, “expected” means “expected as a terminal”.
Even if a symbol is predicted at the current earleme,
if it is not acceptable as a terminal,
it does not trigger an
“expected symbol event”.

By default, the “expected symbol event bit” is 0.
It is an error to attempt to set the
“expected symbol event bit” to 1 for a nulling symbol,
an inaccessible symbol,
or an unproductive symbol.

Return value:
The value of the event bit after the method call is finished.
-2 if symbol_id is not the ID of a valid symbol;
if it is the ID of an nulling, inaccessible for unproductive symbol;
or on other failure.

Function: intmarpa_r_is_exhausted(Marpa_Recognizer r)

A parser is “exhausted” if it cannot accept any more input.
Both successful and failed parses can be exhausted.
In many grammars,
the parse is always exhausted as soon as it succeeds.
Good parses may also exist at earlemes prior to the
current one.

Allows the user to deactivate and reactivate symbol nulled
events in the recognizer.
If boolean is zero, the event is deactivated.
If boolean is one, the event is activated.

Symbol nulled events are active by default if
the symbol was set up for nulled events in the
grammar.
If a symbol was not set up for nulled events in
the grammar, symbol nulled events are inactive
by default and any attempt to change that is a fatal error.

Success cases:
On success,
the method returns the value of boolean.
The method succeeds trivially
if the symbol is already set as indicated by
boolean.

Failure cases:
If the active status
of the
nulled event for sym_id
cannot be set as indicated by boolean,
the method fails.
On failure, -2 is returned.

Allows the user to deactivate and reactivate symbol prediction
events in the recognizer.
If boolean is zero, the event is deactivated.
If boolean is one, the event is activated.

Symbol prediction events are active by default if
the symbol was set up for prediction events in the
grammar.
If a symbol was not set up for prediction events in
the grammar, symbol prediction events are inactive
by default and any attempt to change that is a fatal error.

Success cases:
On success,
the method returns the value of boolean.
The method succeeds trivially
if the symbol is already set as indicated by
boolean.

Failure cases:
If the active status
of the
prediction event for sym_id
cannot be set as indicated by boolean,
the method fails.
On failure, -2 is returned.

Returns a list of the ID’s of the symbols
that are acceptable as tokens
at the current earleme.
buffer is expected to be large enough to hold
the result.
This is guaranteed to be the case if the buffer
is large enough to hold a number of
Marpa_Symbol_ID’s that
is greater than or equal to the number of symbols
in the grammar.

Return value: On success, the number of Marpa_Symbol_ID’s
in buffer.
On failure, -2.

Return values on success:
If symbol_id is the
ID of a valid terminal symbol that is expected at
the current earleme,
a number greater than zero.
If symbol_id is the
ID of a valid terminal symbol that is
not expected at
the current earleme, or
if symbol_id is the ID of a valid symbol
that is not a terminal, zero.

Failure cases:
Returns -2 on failure.
It is a failure
if symbol_id is not the ID of a valid
symbol.

13 Progress reports

An important advantage of the Marpa algorithm is the ability
to easily get full information about the state of the parse.

To start a progress report,
use the marpa_r_progress_report_start() command.
Only one progress report can be in use at any one time.

To get the information in a progress report,
it is necessary to step through the progress report
items.
To get the data for the current progress report item,
and advance to the next one,
use the marpa_r_progress_item() method.

To destroy a progress report,
freeing the memory it uses,
call the marpa_r_progress_report_finish() method.

Function: intmarpa_r_progress_report_reset( Marpa_Recognizer r)

Resets the progress report.
Assumes a report of the progress has already been initialized
at some Earley set
for recognizer r,
with
marpa_r_progress_report_start().
The reset progress report will
be positioned before its first item.

Initializes a report of the progress at Earley set set_id
for recognizer r.
If a progress report already exists, it is destroyed and its
memory is freed.
Initially,
the progress report is positioned before its first item.

If no Earley set with ID
set_id exists,
marpa_r_progress_report_start() fails.
The error code is MARPA_ERR_INVALID_LOCATION if set_id
is negative.
The error code is MARPA_ERR_NO_EARLEY_SET_AT_LOCATION
if set_id is greater than the ID of
the latest Earley set.

Return value: On success, the number of report items available.
If the recognizer has not been started;
if set_id does not exist;
or on other failure, -2.

Function: intmarpa_r_progress_report_finish( Marpa_Recognizer r )

Destroys the report of the progress at Earley set set_id
for recognizer r,
freeing the memory and other resources.
It is often not necessary to call this method.
Any previously existing progress report
is destroyed automatically
whenever a new progress report is started,
and when the recognizer is destroyed.

Return value: -2 if no progress report has been started,
or on other failure.
On success, a non-negative value.

This method allows access to the data
for the next item of a
progress report.
If there are no more progress report items,
it returns -1 as a termination indicator
and sets the error code to MARPA_ERR_PROGRESS_REPORT_EXHAUSTED.
Either the termination indicator,
or the item count returned by
marpa_r_progress_report_start(),
can be used to determine when the last
item has been seen.

On success,
the dot position is returned in the location
pointed to by the position argument,
and the origin is returned in the location
pointed to by the origin argument.
On failure, the locations pointed to by
the position and origin
arguments are unchanged.

Return value: On success, the rule ID of
the next progress report item.
If there are no more progress report items, -1.
If either the position or the origin
argument is NULL,
or on other failure, -2.

14 Bocage methods

14.1 Overview

A bocage is structure containing the full set of parses found
by processing the input according to the grammar.
The bocage structure is new with Libmarpa, but is very similar
in purpose to the more familar parse forests.

To create a bocage, use the marpa_b_new() method.

When a bocage is no longer in use, its memory can be freed
using the
marpa_b_unref() method.

14.2 Creating a new bocage

Creates a new bocage object, with a reference count of 1.
The reference count of its parent recognizer object, r,
is increased by 1.
If earley_set_ID is -1,
the Earley set at the current earleme is used,
if there is one.

If earley_set_ID is -1
and there is no Earley set at the current earleme;
or if earley_set_ID is -1
and there is no parse ending at Earley set earley_set_ID,
marpa_b_new() fails
and the error code is set to
MARPA_ERR_NO_PARSE.

Success return value: On success, the new bocage object.
On failure, NULL.

14.3 Reference counting

Function: Marpa_Bocagemarpa_b_ref(Marpa_Bocage b)

Increases the reference count by 1.
Not needed by most applications.

Return value:
On success, b.
On failure, NULL.

Function: voidmarpa_b_unref(Marpa_Bocage b)

Decreases the reference count by 1,
destroying b once the reference count reaches
zero.
When b is destroyed, the reference count
of its parent recognizer is decreased by 1.
If this takes the reference count of the parent recognizer
to zero, it too is destroyed.
If the parent recognizer is destroyed, the reference count
of its base grammar is decreased by 1.
If this takes the reference count of the base grammar
to zero, it too is destroyed.

14.4 Accessors

Function: intmarpa_b_ambiguity_metric(Marpa_Bocage b)

Returns an ambiguity metric.
The metric is 1 is the parse is unambiguous.
If the metric is 2 or greater,
the parse is ambiguous.
It was originally intended to have values greater
than 2 be an cheaply computed
estimate of the degree of ambiguity,
but a satisfactory scheme for this
has yet to be implemented.

Return value on success:
1 if the bocage is not for an ambiguous parse;
2 or greater if the bocage is for an ambiguous parse.

Failures: On failure, -2.

Function: intmarpa_b_is_null(Marpa_Bocage b)

Return value on success:
A number greater than or equal to 1 if the bocage is for a null parse;
otherwise, 0.

Failures: On failure, -2.

15 Ordering methods

15.1 Overview

Before iterating the parses in the bocage,
they must be ordered.
To create an ordering, use the marpa_o_new() method.
When an ordering is no longer in use, its memory can be freed
using the
marpa_o_unref() method.

An ordering is frozen once the first
tree iterator is created
using it.
A frozen ordering cannot be changed.

As of this writing, the only methods to order parses
are internal and undocumented.
This is expected to change.

15.2 Creating an ordering

Function: Marpa_Ordermarpa_o_new( Marpa_Bocage b)

Creates a new ordering object, with a reference count of 1.
The reference count of its parent bocage object, b,
is increased by 1.

Return value: On success, the new ordering object.
On failure, NULL.

15.3 Reference counting

Function: Marpa_Ordermarpa_o_ref( Marpa_Order o)

Increases the reference count by 1.
Not needed by most applications.

Return value:
On success, o.
On failure, NULL.

Function: voidmarpa_o_unref( Marpa_Order o)

Decreases the reference count by 1,
destroying o once the reference count reaches
zero.
Beginning with o’s parent bocage,
Libmarpa then proceeds up the chain of parent objects.
Every time a child is destroyed, the
reference count of its parent is decreased by 1.
Every time the reference count of an object
is decreased by 1,
if that reference count is now zero,
that object is destroyed.
Libmarpa follows this chain of decrements
and destructions as required,
all the way back to the
base grammar, if necessary.

15.4 Accessors

Function: intmarpa_o_ambiguity_metric(Marpa_Order o)

Returns an ambiguity metric.
The metric is 1 is the parse is unambiguous.
If the metric is 2 or greater,
the parse is ambiguous.
It was originally intended to have values greater
than 2 be an cheaply computed
estimate of the degree of ambiguity,
but a satisfactory scheme for this
has yet to be implemented.

If the ordering is not already frozen,
it will be frozen on return from
marpa_o_ambiguity_metric().
marpa_o_ambiguity_metric()
is considered an “accessor”,
because it is assumed that the ordering is frozen
when marpa_o_ambiguity_metric()
is called.

Return value on success:
1 if the ordering is not for an ambiguous parse;
2 or greater if the ordering is for an ambiguous parse.

Failures: On failure, -2.

Function: intmarpa_o_is_null(Marpa_Order o)

Return value on success:
A number greater than or equal to 1 if the ordering is for a null parse;
otherwise, 0.

Failures: On failure, -2.

15.5 Non-default ordering

Function: intmarpa_o_high_rank_only_set( Marpa_Order o, int flag)

Function: intmarpa_o_high_rank_only( Marpa_Order o)

These methods, respectively, set and query
the “high rank only” flag of ordering o.
A flag of 1 indicates that, when ranking,
all choices should be discarded except those of the
highest rank.
A flag of 0 indicates that
no choices should be discarded on the
basis of their rank.

A value of 1 is the default.
The value of the “high rank only” flag has no effect
unless ranking has been turned on using the
marpa_o_rank() method.

Return value: On success, the value of the
“high rank only” flag after
the call.
On failure, -2.

Function: intmarpa_o_rank( Marpa_Order o )

By default, the ordering of parse trees is arbitrary.
This method causes the ordering to be ranked
according to the ranks of symbols and rules,
the “null ranks high” flags of the rules,
and the “high rank only” flag of the ordering.
Once this method returns, the ordering is frozen.

Return value: On success, a non-negative value.
On failure, -2.

16 Tree methods

16.1 Overview

Once the bocage has an ordering, the parses trees can be iterated.
Marpa’s parse tree iterators iterate the parse trees contained
in a bocage object.
In Libmarpa,
“parse tree iterators” are usually just called trees.

To create a tree, use the marpa_t_new() method.
A newly created tree iterator is positioned before the first parse tree.
When a tree iterator is no longer in use, its memory can be freed
using the
marpa_t_unref() method.

To position a newly created tree iterator at the first parse tree,
use the marpa_t_next() method.
Once the tree iterator is positioned at a parse tree,
the same marpa_t_next() method is used
to position it to the next parse tree.

16.2 Creating a new tree iterator

Function: Marpa_Treemarpa_t_new(Marpa_Order o)

Creates a new tree iterator, with a reference count of 1.
The reference count of its parent ordering object, o,
is increased by 1.

When initialized, a tree iterator is positioned
before the first parse tree.
To position the tree iterator to the first parse,
the application must call marpa_t_next().

Return value: On success, a newly created tree.
On failure, NULL.

16.3 Reference counting

Function: Marpa_Treemarpa_t_ref(Marpa_Tree t)

Increases the reference count by 1.
Not needed by most applications.

Return value:
On success, t.
On failure, NULL.

Function: voidmarpa_t_unref(Marpa_Tree t)

Decreases the reference count by 1,
destroying t once the reference count reaches
zero.
Beginning with t’s parent ordering,
Libmarpa then proceeds up the chain of parent objects.
Every time a child is destroyed, the
reference count of its parent is decreased by 1.
Every time the reference count of an object
is decreased by 1,
if that reference count is now zero,
that object is destroyed.
Libmarpa follows this chain of decrements
and destructions as required,
all the way back to the
base grammar, if necessary.

16.4 Iterating through the trees

Function: intmarpa_t_next( Marpa_Tree t)

Positions t at the next parse tree
in the iteration.
Tree iterators are initialized to the position
before the first parse tree,
so this method must be called before creating a valuator
from a tree.

If a tree iterator is positioned after the last parse,
the tree is said to be “exhausted”.
A tree iterator for a bocage with no parse trees
is considered to be “exhausted” when initialized.
If the tree iterator is exhausted, marpa_t_next()
returns -1 as a termination indicator,
and sets the error code to
MARPA_ERR_TREE_EXHAUSTED.

Return value: On success, a non-negative value.
If the tree iterator is exhausted, -1.
On failure, -2.

Function: intmarpa_t_parse_count( Marpa_Tree t)

The parse counter counts the number of parse trees
traversed so far.
The count includes the current iteration of the
tree, so that a value of 0 indicates that the tree iterator
is at its initialized position,
before the first parse tree.

Return value: The number of parses traversed so far.
Always succeeds.

17 Value methods

17.1 Overview

The archetypal application needs
a value object (or valuator) to produce
the value of the parse.
To create a valuator, use the marpa_v_new() method.
When a valuator is no longer in use, its memory can be freed
using the
marpa_v_unref() method.

The application is required to maintain the stack,
and the application is also required to implement
most of the semantics, including the evaluation
of rules.
Libmarpa’s valuator provides instructions to
the application on how to manipulate the stack.
To iterate through this series of instructions,
use the marpa_v_step() method.

17.2 How to use the valuator

Libmarpa’s valuator provides the application with
“steps”, which are
instructions for stack manipulation.
Libmarpa itself does not maintain a stack.
This leaves the upper layer in total control of the
stack and the values which are placed on it.

As example may make this clearer.
Suppose the evalution is at a place in the parse tree
where an addition is being performed.
Libmarpa does not know that the operation
is an addition.
It will tell the application that rule number R
is to be applied to the arguments at stack locations
N and N+1, and that the result is to placed in
stack location N.

In this system
the application keeps track of the semantics for all
rules, so it looks up rule R and determines that it
is an addition.
The application can do this by using R as an index
into an array of callbacks, or by any other method
it chooses.
Let’s assume a callback implements the semantics
for rule R.
Libmarpa has told the application that two arguments
are available for this operation, and that they are
at locations N and N+1 in the stack.
They might be the numbers 42 and 711.
So the callback is called with its two arguments,
and produces a return value, let’s say, 753.
Libmarpa has told the application that the result
belongs at location N in the stack,
so the application writes 753 to location N.

Since Libmarpa knows nothing about the semantics,
the operation for rule R could be string concatenation
instead of addition.
Or, if it is addition, it could allow for its arguments
to be floating point or complex numbers.
Since the application maintains the stack, it is up
to the application whether the stack contains integers,
strings, complex numbers, or polymorphic objects which are
capable of being any of these things and more.

17.3 Advantages of step-driven valuation

Step-driven valuation
hides Libmarpa’s grammar rewrites from the application,
and is quite efficient.
Libmarpa knows which rules are sequences.
Libmarpa optimizes stack manipulations based on this knowledge.
Long sequences
are very common in practical grammars.
For these,
the stack manipulations suggested by Libmarpa’s
step-driven valuator
will be significantly faster than the
traditional stack evaluation algorithm.

Step-driven evalution has another advantage.
To illustrate this,
consider what is a very common case:
The semantics are implemented in a higher-level language,
using callbacks.
If Libmarpa did not use step-driven valuation,
it would need to provide for this case.
But for generality,
Libmarpa would have to deal in C callbacks.
Therefore, a middle layer would have to create C language wrappers
for the callbacks in the higher level language.

The implementation that results is this:
The higher level language would need to wrap each callback in C.
When calling Libmarpa, it would pass the wrappered callback.
Libmarpa would then need to call the C language “wrappered” callback.
Next, the wrapper would call the higher-level language callback.
The return value,
which would be data native to the higher-level language,
would need to be passed to the C language wrapper,
which will need to make arrangements for it to be based
back to the higher-level language when appropriate.

A setup like this is not terribly efficient.
And exception handling across language boundaries would be
very tricky.
But neither of these is the worst problem.

Callbacks are hard to debug.
Wrappered callbacks are even worse.
Calls made across language boundaries
are harder yet to debug.
In the system described above,
by the time a return value is finally consumed,
a language boundary will have been crossed four times.

How do
programmers deal with difficulties like this?
Usually, by
doing the absolute minimum possible in the callbacks.
A horrific debugging enviroment can become a manageable
one if there is next to no code to be debugged.
And this can be accomplished by
doing as much as possible in pre- and post-processing.

In essence, callbacks force applications to do most
of the programming
via side effects.
One need not be a functional programming purist to find
this a very undesirable style of design to force on
an application.
But the ability to debug can make the difference between
code that does work and code that does not.
Unfairly or not,
code is rarely considered well-designed when it does
not work.

So, while step-driven valuation seems
a roundabout approach,
it is simpler and more direct than
the likely alternatives.
And there is something to be said for pushing
semantics up to the higher levels —
they can be expected to know more about it.

These advantages of step-driven valuation
are strictly in
the context of a low-level interface.
The author is under no illusion
that direct use of Libmarpa’s valuator will be found
satisfactory by most application programmers,
even those using the C language.
The author certainly avoids using step-driven valuation directly.
Libmarpa’s valuator is intended
to be used via an upper layer,
one which does know about semantics.

17.4 Maintaining the stack

This section discusses in detail the requirements
for maintaining the stack.
In some cases,
such as implementation using a Perl array,
fulfilling these requirements is trivial.
Perl auto-extends its arrays,
and initializes the element values,
on every read or write.
For the C programmer,
things are not quite so easy.

In this section,
we will assume a C90 or C99 standard-conformant
C application.
This assumption is convenient on two grounds.
First, this will be the intended use
for many readers.
Second, standard-conformant C is
a “worst case”.
Any issue faced by a programmer of another environment
is likely to also be one that must be solved
by the C programmer.

Libmarpa often
optimizes away unnecessary stack writes
to stack locations.
When it does so, it will not
necessarily optimize away all reads
to that stack location.
This means that a location’s first access,
as suggested by the Libmarpa step instructions,
may be a read.
This possibility
requires a special awareness from the C
programmer,
as discussed in
the sections
Sizing the stack and
Initializing locations in the stack.

In the discussions in this document,
stack locations are non-negative integers.
The bottom of the stack is location 0.
In moving from the bottom of the stack to the top,
the numbers increase.
Stack location Y is said to be “greater”
than stack location X if stack location
Y is closer to the top of stack than location X,
and therefore stack locations are considered greater or
lesser if the integers that represent them are
greater or lesser.
Another way to state that
a stack location Y is greater (lesser)
than stack location X is to say that
a stack location Y is later (earlier)
than stack location X.

17.4.1 Sizing the stack

If an implementation applies Libmarpa’s step
instructions literally, using a physical stack,
it must make sure the stack is large enough.
Specifically, the application must do the following

Ensure location 0 exists — in other
words that the stack is at least length 1.

For MARPA_STEP_RULE steps,
ensure that stack locations from marpa_v_arg_0(v)
to marpa_v_arg_n(v) exist.

Three aspects of these requirements deserve special mention.
First,
note that the requirement for a
MARPA_STEP_RULE is that the application
size the stack to include the arguments to be
read.
Because stack writes may be optimized away,
an application,
when reading,
cannot assume
that the stack was
sized appropriately by a prior write.
The first access to a new stack location may be
a read.

Second,
note that there is no explicit requirement that
the application size the stack to include the
location for the result of the
MARPA_STEP_RULE step.
An application is allowed to assume that
result will go into one of the locations
that were read.

Third, special note should be made of the requirement
that location 0 exist.
By convention, the parse result resides
in location 0 of the stack.
Because of potential optimizations,
an application cannot assume that it
will receive a Libmarpa step instruction that
either reads from or writes to location 0.

17.4.2 Initializing locations in the stack

Write optimizations also creates issues for implementations
which require data to be initialized before reading.
Every fully standard-conforming C application is such an
implementation.
Both C90 and C99 allow “trap values”,
and therefore conforming applications must be
prepared for
an uninitialized location to contain one of those.
Reading a trap value may cause an abend.
(It is safe, in standard-conforming C, to write to a location
containing a trap value.)

The requirement that locations be initialized before
reading occurs in other implementations.
Any implementation that has a “universe” of “safe” values,
may require special precautions.
The required precautions may amount to a need to initialize
“uninitialized” values.
A practical example might be an implementation that expects
all locations to contain a pointer which it can safely indirect
from.
In such implementations,
just as in standard-conformant C,
every stack location
needs to be initialized before being read.

Due to write optimizations, an application
cannot rely on Libmarpa’s step instructions to
initialize every stack location before its first read.
One way to safely deal with the
initialization of stack locations,
is to do all of the following:

When starting evaluation, ensure that the stack contains at least location 0.

Also, when starting evaluation, initialize every location in the stack.

Whenever the stack is extended,
initialize every stack location added.

Applications which try to optimize out some of
these initializations
need to be aware that
an application can never assume that activity in
the stack is safely “beyond” an uninitialized location.
Libmarpa steps often revisit earlier sections of the stack,
and these revisits may include reads of previously
unvisited stack locations.

17.5 Creating a new valuator

Function: Marpa_Valuemarpa_v_new( Marpa_Tree t )

Creates a new valuator.
The parent object of the new valuator
will be the tree iterator t,
and the reference count of the new valuator will be 1.
The reference count of t is increased by 1.

The parent tree iterator is “paused”,
so that the tree iterator
cannot move on to a new parse tree
until the valuator is destroyed.
Many valuators of the same parse tree
can exist at once.
A tree iterator is “unpaused” when
all of the valuators of that tree iterator are destroyed.

Return value: On success, the newly created valuator.
On failure, NULL.

17.6 Reference counting

Function: Marpa_Valuemarpa_v_ref(Marpa_Value v)

Increases the reference count by 1.
Not needed by most applications.

Return value:
On success, v.
On failure, NULL.

Function: voidmarpa_v_unref( Marpa_Value v)

Decreases the reference count by 1,
destroying v once the reference count reaches
zero.
Beginning with v’s parent tree,
Libmarpa then proceeds up the chain of parent objects.
Every time a child is destroyed, the
reference count of its parent is decreased by 1.
Every time the reference count of an object
is decreased by 1,
if that reference count is now zero,
that object is destroyed.
Libmarpa follows this chain of decrements
and destructions as required,
all the way back to the
base grammar, if necessary.

17.7 Stepping through the valuator

Function: Marpa_Step_Typemarpa_v_step( Marpa_Value v)

This method “steps through” the valuator.
The return value is a Marpa_Step_Type,
an integer which indicates the type of step.
How the application is expected to act on
each step is described below.
See Valuator steps by type.
When the iteration through the steps is finished,
marpa_v_step returns MARPA_STEP_INACTIVE.

Return value: On success, the type of the step
to be performed.
This will always be a non-negative number.
On failure, -2.

17.8 Valuator steps by type

Macro: Marpa_Step_TypeMARPA_STEP_RULE

The semantics of a rule should be performed.
The application can find the value of the rule’s
children in the stack locations from
marpa_v_arg_0(v)
to marpa_v_arg_n(v).
The semantics for the rule whose ID is
marpa_v_rule(v) should be executed
on these child values,
and the result placed in
marpa_v_result(v).
In the case of a MARPA_STEP_RULE step,
the stack location of
marpa_v_result(v) is guaranteed to
be equal to marpa_v_arg_0(v).

Macro: Marpa_Step_TypeMARPA_STEP_TOKEN

The semantics of a non-null token should be performed.
The application’s value for the token whose ID is
marpa_v_token(v) should be
placed in
stack location marpa_v_result(v).
Its value according to Libmarpa will be in
marpa_v_token_value(v).

Macro: Marpa_Step_TypeMARPA_STEP_NULLING_SYMBOL

The semantics for a nulling symbol should be performed.
The ID of the symbol is
marpa_v_symbol(v) and its value should
be placed in
stack location marpa_v_result(v).

Macro: Marpa_Step_TypeMARPA_STEP_INACTIVE

The valuator has gone through all of its steps
and is now inactive.
The value of the parse will be in stack location 0.
Because of optimizations,
it is possible for valuator to immediately
became inactive — MARPA_STEP_INACTIVE could
be both the first and last step.

Macro: Marpa_Step_TypeMARPA_STEP_INITIAL

The valuator is new and has
yet to go through any steps.

Macro: Marpa_Step_TypeMARPA_STEP_INTERNAL1

Macro: Marpa_Step_TypeMARPA_STEP_INTERNAL2

Macro: Marpa_Step_TypeMARPA_STEP_TRACE

These step types are reserved for internal purposes.

17.9 Basic step accessors

The basic step accessors are so called because their information
is basic to the stack manipulation.
The basic step accessors are implemented as macros.
They always succeed.

Macro: intmarpa_v_arg_0(Marpa_Value v)

For a MARPA_STEP_RULE step,
returns the stack location where the value of first child
can be found.

Macro: intmarpa_v_arg_n(Marpa_Value v)

For a MARPA_STEP_RULE step,
returns the stack location where the value of the last child
can be found.

Macro: intmarpa_v_result(Marpa_Value v)

For MARPA_STEP_RULE,
MARPA_STEP_TOKEN,
and MARPA_STEP_NULLING_SYMBOL steps,
returns the stack location where the result of the semantics
should be placed.

Macro: Marpa_Rule_IDmarpa_v_rule(Marpa_Value v)

For the
MARPA_STEP_RULE step,
returns the ID of the rule.

Macro: Marpa_Step_Typemarpa_v_step_type(Marpa_Value v)

Returns the current step type:
MARPA_STEP_TOKEN,
MARPA_STEP_RULE,
etc.
Usually not needed since this is also the return value of
marpa_v_step().

Macro: Marpa_Symbol_IDmarpa_v_symbol(Marpa_Value v)

For the MARPA_STEP_NULLING_SYMBOL step,
returns the ID of the symbol.
The value returned is the same as that
returned by the marpa_v_token()
macro.

Macro: Marpa_Symbol_IDmarpa_v_token(Marpa_Value v)

For the MARPA_STEP_TOKEN step,
returns the ID of the token.
The value returned is the same as that
returned by the marpa_v_symbol()
macro.

Macro: intmarpa_v_token_value(Marpa_Value v)

For the MARPA_STEP_TOKEN step,
returns the integer which is
(or which represents)
the value of the token.

17.10 Other step accessors

This section contains the step accessors that
are not basic to stack manipulation, but which provide
other useful information about the parse.
These step accessors are implemented as macros.

All of these accessors always succeed, but
if called when they are irrelevant
they return an unspecified value.
In this context, an “unspecified value” is a value
that is either -1 or the ID of a valid Earley set,
but which is otherwise unpredictable.

Macro: Marpa_Earley_Set_IDmarpa_v_es_id(Marpa_Value v)

Return value:
If the current step type is MARPA_STEP_RULE,
the Earley Set ordinal where the rule ends.
If the current step type is
MARPA_STEP_TOKEN
or MARPA_STEP_NULLING_SYMBOL,
the Earley Set ordinal where the symbol ends.
If the current step type is anything else, an unspecified value.

Macro: Marpa_Earley_Set_IDmarpa_v_rule_start_es_id(Marpa_Value v)

Return value:
If the current step type is MARPA_STEP_RULE,
the Earley Set ordinal where the rule begins.
If the current step type is anything else, an unspecified value.

Macro: Marpa_Earley_Set_IDmarpa_v_token_start_es_id(Marpa_Value v)

Return value:
If the current step type is MARPA_STEP_TOKEN
or MARPA_STEP_NULLING_SYMBOL,
the Earley Set ordinal where the token begins.
If the current step type is anything else, an unspecified value.

18 Events

18.1 Overview

Events are generated by the
marpa_g_precompute(),
marpa_r_earleme_complete(),
and
marpa_r_start_input() methods.
The methods are called event-active.
Event-active methods always clear all previous events,
so that after an event-active method the only events
available
will be those generated by that method.

Events are volatile,
and it is expected that events will be queried
immediately after the method that generated them.
Note especially
that multiple recognizers using the same base grammar
overwrite each other’s events.

To find out how many events were generated by the last
event-active method,
use the marpa_g_event_count method.

To query a specific event,
use the marpa_g_event and
marpa_g_event_value methods.

One or more rules are loop rules — rules
that are part of a cycle.
Cycles are pathological cases of recursion,
in which the same symbol string derives itself
a potentially infinite number of times.
Nonetheless, Marpa parses in the presence of these,
and it is up to the application to treat these
as fatal errors,
something they almost always will wish to do.
Event value: The count of loop rules.
Suggested message: "Grammar contains a infinite loop".

Macro: intMARPA_EVENT_NULLING_TERMINAL

A nulling symbol is also a terminal.
Event value: The ID of the symbol.
Suggested message: "This symbol is a nulling terminal".

Macro: intMARPA_EVENT_SYMBOL_COMPLETED

The recognizer can be set to generate an event
a symbol is completed
using its marpa_g_symbol_is_completion_event_set()
method.
(A symbol is "completed" if and only if any rule with that symbol
as its LHS is completed.)
This event code indicates that one of those events
occurred.
Event value: The ID of the completed symbol.
Suggested message: "Completed symbol".

Macro: intMARPA_EVENT_SYMBOL_EXPECTED

The recognizer can be set to generate an event when a
symbol is expected as a terminal,
using its marpa_r_expected_symbol_event_set()
method.
Note that this event only triggers if the symbol is
expected as a terminal.
Predicted symbols which are not expected as terminals
do not trigger this event.
This event code indicates that one of those events
occurred.
Event value: The ID of the expected symbol.
Suggested message: "Expecting symbol".

Macro: intMARPA_EVENT_SYMBOL_NULLED

The recognizer can be set to generate an event when a
symbol is nulled – that is, recognized as a
zero-length symbol.
To set an nulled symbol event,
use the recognizer’s marpa_r_nulled_symbol_event_set()
method.
This event code indicates that a nulled symbol event
occurred.
Event value: The ID of the nulled symbol.
Suggested message: "Symbol was nulled".

Macro: intMARPA_EVENT_SYMBOL_PREDICTED

The recognizer can be set to generate an event when a
symbol is predicted.
To set an predicted symbol event,
use the recognizer’s marpa_g_symbol_is_prediction_event_set()
method.
Unlike the
MARPA_EVENT_SYMBOL_EXPECTED event,
the MARPA_EVENT_SYMBOL_PREDICTED event
triggers for predictions of both
non-terminals and terminals.
This event code indicates that a predicted symbol event
occurred.
Event value: The ID of the predicted symbol.
Suggested message: "Symbol was predicted".

19 Error methods, macros and codes

19.1 Error methods

When a method fails,
this method allows the application to read
the error code.
p_error_string is reserved for use by
the internals.
Applications should set it to NULL.

Return value: The last error code from a Libmarpa method.
Always succeeds.

Function: Marpa_Error_Codemarpa_g_error_clear( Marpa_Grammar g )

Sets the error code
to MARPA_ERR_NONE.
Not often used,
but now and then it can be useful
to force the error code to a known state.

Return value: MARPA_ERR_NONE.
Always succeeds.

19.2 Error Macros

Macro: intMARPA_ERRCODE_COUNT

The number of error codes.
All error codes, whether internal or external,
will be integers, non-negative but
strictly less than MARPA_ERRCODE_COUNT.

19.3 External error codes

This section lists the external error codes.
These are the only error codes that users
of the Libmarpa external interface should ever see.
Internal error codes are in their own section.
See Internal error codes.

Macro: intMARPA_ERR_NONE

No error condition.
The error code is initialized to this value.
Methods which do not result in failure
sometimes reset the error code to MARPA_ERR_NONE.
Numeric value: 0.
Suggested message: "No error".

Macro: intMARPA_ERR_BAD_SEPARATOR

A separator was specified for a sequence rule,
but its ID was not that
of a valid symbol.
Numeric value: 6.
Suggested message: "Separator has invalid symbol ID".

Macro: intMARPA_ERR_BEFORE_FIRST_TREE

A tree iterator is positioned before the first tree,
and it was specified in a context where that is not
allowed.
A newly created tree is positioned before the first
tree.
To position a newly created tree iterator to the first tree
use the marpa_t_next() method.
Numeric value: 91.
Suggested message: "Tree iterator is before first tree".

Macro: intMARPA_ERR_COUNTED_NULLABLE

A “counted” symbol was found
that is also a nullable symbol.
A “counted” symbol is one that appears on the RHS
of a sequence rule.
If a symbol is nullable,
counting its occurrences becomes difficult.
Questions of definition and
problems of implementation arise.
At a minimum, a sequence with counted nullables
would be wildly
ambigious.

Sequence rules are simply an optimized shorthand
for rules that can also be written in ordinary BNF.
If the equivalent of a sequence of nullables is
really what your application needs,
nothing in Libmarpa prevents you from specifying
that sequence
with ordinary BNF rules.

This error indicates an attempt to add a BNF rule which
is a duplicate of a BNF rule already in the grammar.
Two BNF rules are considered duplicates if

Both rules have the same left hand symbol, and

Both rules have the same right hand symbols in the same order.

Duplication of sequence rules,
and duplication between BNF rules and sequence rules,
is dealt with by requiring that the LHS of a sequence rule
not be the LHS of any other rule.

Numeric value: 11.
Suggested message: "Duplicate rule".

Macro: intMARPA_ERR_DUPLICATE_TOKEN

This error indicates an attempt to add a duplicate token.
A token is a duplicate if one already read at the same
earleme has the same symbol ID and the same length.
Numeric value: 12.
Suggested message: "Duplicate token".

Macro: intMARPA_ERR_YIM_COUNT

This error code indicates that
an implementation-defined limit on the
number of Earley items per Earley set
was exceedeed.
This limit is different from
the Earley item warning threshold,
an optional limit on the number
of Earley items in an Earley set,
which can be set by the application.

The implementation defined-limit is very large,
at least 500,000,000 earlemes.
An application is unlikely ever to see this
error.
Libmarpa’s use of memory
would almost certainly exceed the implementation’s
limits before it occurred.
Numeric value: 13.
Suggested message: "Maximum number of Earley items exceeded".

An non-negative event index was specified,
but there is no event at that index.
Since the events are in sequence, this means it
was too large.
Numeric value: 16.
Suggested message: "No event at that index".

Macro: intMARPA_ERR_GRAMMAR_HAS_CYCLE

The grammar has a cycle — one or more loop
rules.
This is a recoverable error,
although most applications will want to treat
it as fatal.
For more see the description of marpa_g_precompute.
Numeric value: 17.
Suggested message: "Grammar has cycle".

Macro: intMARPA_ERR_HEADERS_DO_NOT_MATCH

This is an internal error, and indicates that
Libmarpa was wrongly built.
Libmarpa was compiled with headers which do not
match the rest of the code.
The solution is to find a correctly built
Libmarpa.
Numeric value: 98.
Suggested message: "Internal error: Libmarpa was built incorrectly"

Macro: intMARPA_ERR_I_AM_NOT_OK

The Libmarpa base grammar is in a "not ok"
state.
Currently, the only way this can happen
is if Libmarpa memory is being overwritten.
Numeric value: 29.
Suggested message: "Marpa is in a not OK state".

Macro: intMARPA_ERR_INACCESSIBLE_TOKEN

This error code indicates that
the token symbol is an inaccessible symbol — one which
cannot be reached from the start symbol.

Since the inaccessibility of a symbol is a property of the grammar,
this error code typically indicates an application error.
Nevertheless, a retry at this location, using another token ID,
may succeed.
At this writing,
the author knows of no uses of this technique.

Numeric value: 18.
Suggested message: "Token symbol is inaccessible".

Macro: intMARPA_ERR_INVALID_BOOLEAN

A function was called that takes a boolean argument,
but the value of that argument was not either 0 or 1.
Numeric value: 22.
Suggested message: "Argument is not boolean".

Macro: intMARPA_ERR_INVALID_LOCATION

The location (Earley set ID) is not valid.
It may be invalid for one of two reasons:

It is negative,
and it is being used as the argument to a method
for which that negative value does not have a special meaning.

It is after the latest Earley set.

For users of input models other than the standard one,
the term “location”, as used in association
with this error code,
means Earley set ID or Earley set ordinal.
In the standard input model, this will always
be identical with Libmarpa’s other idea of
location, the earleme.

Numeric value: 25.
Suggested message: "Location is not valid".

Macro: intMARPA_ERR_INVALID_START_SYMBOL

A start symbol was specified,
but its symbol ID is not that of a valid symbol.
Numeric value: 27.
Suggested message: "Specified start symbol is not valid".

Macro: intMARPA_ERR_INVALID_ASSERTION_ID

A method was called with an invalid assertion ID.
This is a assertion ID which not only does
not exist, but cannot exist.
Currently that means its value is less than zero.
Numeric value: 96.
Suggested message: "Assertion ID is malformed".

Macro: intMARPA_ERR_INVALID_RULE_ID

A method was called with an invalid rule ID.
This is a rule ID which not only does
not exist, but cannot exist.
Currently that means its value is less than zero.
Numeric value: 26.
Suggested message: "Rule ID is malformed".

Macro: intMARPA_ERR_INVALID_SYMBOL_ID

A method was called with an invalid symbol ID.
This is a symbol ID which not only does
not exist, but cannot exist.
Currently that means its value is less than zero.
Numeric value: 28.
Suggested message: "Symbol ID is malformed".

Macro: intMARPA_ERR_MAJOR_VERSION_MISMATCH

There was a mismatch in the major version number
between the requested version
of libmarpa, and the actual one.
Numeric value: 30.
Suggested message: "Libmarpa major version number is a mismatch".

Macro: intMARPA_ERR_MICRO_VERSION_MISMATCH

There was a mismatch in the micro version number
between the requested version
of libmarpa, and the actual one.
Numeric value: 31.
Suggested message: "Libmarpa micro version number is a mismatch".

Macro: intMARPA_ERR_MINOR_VERSION_MISMATCH

There was a mismatch in the minor version number
between the requested version
of libmarpa, and the actual one.
Numeric value: 32.
Suggested message: "Libmarpa minor version number is a mismatch".

Macro: intMARPA_ERR_NO_EARLEY_SET_AT_LOCATION

A non-negative Earley set ID (also called an Earley set ordinal)
was specified,
but there is no corresponding Earley set.
Since the Earley set ordinals are in sequence,
this means that the specified ID is greater
than that of the latest Earley set.
Numeric value: 39.
Suggested message: "Earley set ID is after latest Earley set".

Macro: intMARPA_ERR_NOT_PRECOMPUTED

The grammar is not precomputed,
and attempt was made to do something with it
that is not allowed for unprecomputed
grammars.
For example, a recognizer cannot be
created from a grammar until it is precomputed.
Numeric value: 34.
Suggested message: "This grammar is not precomputed".

Macro: intMARPA_ERR_NO_PARSE

The application attempted to create a bocage
from a recognizer without a parse.
Applications will often want to treat this as
a soft error.
Numeric value: 41.
Suggested message: "No parse".

Macro: intMARPA_ERR_NO_RULES

A grammar which has no rules is being used
in a way that is not allowed.
Usually the problem is that the user is
trying to precompute the grammar.
Numeric value: 42.
Suggested message: "This grammar does not have any rules".

Macro: intMARPA_ERR_NO_START_SYMBOL

The grammar has no start symbol,
and an attempt was made to perform an
operation which requires one.
Usually the problem is that the user is
trying to precompute the grammar.
Numeric value: 43.
Suggested message: "This grammar has no start symbol".

Macro: intMARPA_ERR_NO_SUCH_ASSERTION_ID

A method was called with an assertion ID which is well-formed,
but the assertion does not exist.
Numeric value: 97.
Suggested message: "No assertion with this ID exists".

Macro: intMARPA_ERR_NO_SUCH_RULE_ID

A method was called with a rule ID which is well-formed,
but the rule does not exist.
Numeric value: 89.
Suggested message: "No rule with this ID exists".

Macro: intMARPA_ERR_NO_SUCH_SYMBOL_ID

A method was called with a symbol ID which is well-formed,
but the symbol does not exist.
Numeric value: 90.
Suggested message: "No symbol with this ID exists".

Macro: intMARPA_ERR_NO_TOKEN_EXPECTED_HERE

This error code indicates that
no tokens at all were expected at this earleme
location.
This can only happen in alternative input models.

Typically, this indicates an application programming
error.
Retrying input at this location will always fail.
But if the application is able to leave this
earleme empty, a retry at a later location,
using this or another token,
may succeed.
At this writing,
the author knows of no uses of this technique.

This error occurs in situations where a rule
is required to be a sequence,
and indicates that the rule of interest is,
in fact,
not a sequence.

Numeric value: 99.
Suggested message: "Rule is not a sequence".

Macro: intMARPA_ERR_NULLING_TERMINAL

Marpa does not allow a symbol to be both nulling
and a terminal.
Numeric value: 49.
Suggested message: "A symbol is both terminal and nulling".

Macro: intMARPA_ERR_ORDER_FROZEN

The Marpa order object has been frozen.
If a Marpa order object is frozen, it cannot be
changed.

Multiple tree iterators can share a Marpa order object,
but that order object is frozen after the first tree
iterator is created from it.
Applications can order an bocage in many ways,
but they must do so by creating multiple order objects.

Numeric value: 50.
Suggested message: "The ordering is frozen".

Macro: intMARPA_ERR_PARSE_EXHAUSTED

The parse is exhausted.
Numeric value: 53.
Suggested message: "The parse is exhausted".

Macro: intMARPA_ERR_PARSE_TOO_LONG

The parse is too long.
The limit on the length of a parse is implementation
dependent, but it is very large,
at least 500,000,000 earlemes.

This error code is unlikely in the standard input model.
Almost certainly memory would be exceeded
before it could occur.
If an application sees this error,
it almost certainly using one of the non-standard
input models.

Most often this messsage will occur because
of a request to add a single extremely long token,
perhaps as a result of an application error.
But it is also possible this error condition will
occur after the input of a large number
of long tokens.

In a method which takes pointers as arguments,
one of the pointer arguments is NULL,
in a case where that is not allowed.
One such method is marpa_r_progress_item().
Numeric value: 56.
Suggested message: "An argument is null when it should not be".

Macro: intMARPA_ERR_PRECOMPUTED

An attempt was made to use a precomputed grammar
in a way that is not allowed.
Often this is an attempt to change the grammar.
Nearly every change to a grammar after
precomputation invalidates the precomputation,
and is therefore not allowed.
Numeric value: 57.
Suggested message: "This grammar is precomputed".

Macro: intMARPA_ERR_PROGRESS_REPORT_NOT_STARTED

No recognizer progress report is currently active,
and an action has been attempted which
is inconsistent with that.
One such action would be a
marpa_r_progress_item() call.
Numeric value: 59.
Suggested message: "No progress report has been started".

Macro: intMARPA_ERR_PROGRESS_REPORT_EXHAUSTED

The progress report is “exhausted” — all its
items have been iterated through.
Numeric value: 58.
Suggested message: "The progress report is exhausted".

Macro: intMARPA_ERR_RANK_TOO_LOW

A symbol or rule rank was specified which
was less than an implementation-defined minimum.
Implementations will always allow at least those
ranks in the range between
-134,217,727 and 134,217,727.
Numeric value: 85.
Suggested message: "Rule or symbol rank too low".

Macro: intMARPA_ERR_RANK_TOO_HIGH

A symbol or rule rank was specified which
was greater than an implementation-defined maximum.
Implementations will always allow at least those
ranks in the range between
-134,217,727 and 134,217,727.
Numeric value: 86.
Suggested message: "Rule or symbol rank too high".

Macro: intMARPA_ERR_RECCE_IS_INCONSISTENT

The recognizer is “inconsistent”,
usually because the user has rejected one or
more rules or terminals,
and has not yet called
the marpa_r_consistent() method.
Numeric value: 95.
Suggested message: "The recognizer is inconsistent.

Macro: intMARPA_ERR_RECCE_NOT_ACCEPTING_INPUT

The recognizer is not accepting input,
and the application has attempted something that
is inconsistent with that fact.
Numeric value: 60.
Suggested message: "The recognizer is not accepting input".

Macro: intMARPA_ERR_RECCE_NOT_STARTED

The recognizer has not been started.
and the application has attempted something that
is inconsistent with that fact.
Numeric value: 61.
Suggested message: "The recognizer has not been started".

Macro: intMARPA_ERR_RECCE_STARTED

The recognizer has been started.
and the application has attempted something that
is inconsistent with that fact.
Numeric value: 62.
Suggested message: "The recognizer has been started".

Macro: intMARPA_ERR_RHS_IX_NEGATIVE

The index of a RHS symbol was specified,
and it was negative.
That is not allowed.
Numeric value: 63.
Suggested message: "RHS index cannot be negative".

Macro: intMARPA_ERR_RHS_IX_OOB

A non-negative index of RHS symbol was specified,
but there is no symbol at that index.
Since the indexes are in sequence, this means the
index was greater than or equal to the rule length.
Numeric value: 64.
Suggested message: "RHS index must be less than rule length".

Macro: intMARPA_ERR_RHS_TOO_LONG

An attempt was made to add a rule with too many
right hand side symbols.
The limit on the RHS symbol count is implementation
dependent, but it is very large,
at least 500,000,000 symbols.
This is
far beyond what is required in any current practical grammar.
An application with rules of this length is almost
certain to run into memory and other limits.
Numeric value: 65.
Suggested message: "The RHS is too long".

Macro: intMARPA_ERR_SEQUENCE_LHS_NOT_UNIQUE

The LHS of a
sequence rule cannot be the LHS of any other rule,
whether a sequence rule or a BNF rule.
An attempt was made to violate this restriction.
Numeric value: 66.
Suggested message: "LHS of sequence rule would not be unique".

Macro: intMARPA_ERR_START_NOT_LHS

The start symbol is not on the LHS on
any rule.
That means it could never match any possible input,
not even the null string.
Presumably, an error in writing the grammar.
Numeric value: 73.
Suggested message: "Start symbol not on LHS of any rule".

Macro: intMARPA_ERR_SYMBOL_IS_NOT_COMPLETION_EVENT

An attempt was made to use a symbol in
a way that requires it to be
set up for completion events,
but the symbol was not set
set up for completion events.
The archetypal case is an attempt to activate completion
events for the symbol in the recognizer.
The archetypal case is an attempt to activate a completion
event in the recognizer for
a symbol that is not set up as a completion event.
Numeric value: 92.
Suggested message: "Symbol is not set up for completion events".

Macro: intMARPA_ERR_SYMBOL_IS_NOT_NULLED_EVENT

An attempt was made to use a symbol in
a way that requires it to be
set up for nulled events,
but the symbol was not set
set up for nulled events.
The archetypal case is an attempt to activate a nulled
events in the recognizer for
a symbol that is not set up as a nulled event.
Numeric value: 93.
Suggested message: "Symbol is not set up for nulled events".

Macro: intMARPA_ERR_SYMBOL_IS_NOT_PREDICTION_EVENT

An attempt was made to use a symbol in
a way that requires it to be
set up for predictino events,
but the symbol was not set
set up for predictino events.
The archetypal case is an attempt to activate a prediction
event in the recognizer for
a symbol that is not set up as a prediction event.
Numeric value: 94.
Suggested message: "Symbol is not set up for prediction events".

Macro: intMARPA_ERR_SYMBOL_VALUED_CONFLICT

Unvalued symbols are a deprecated Marpa feature,
which may be avoided with
the marpa_g_force_valued method.
An unvalued symbol may take on any value,
and therefore a symbol which is unvalued at some points
cannot safely to be used to contain a value at
others.
This error indicates that such an unsafe use is
being attempted.
Numeric value: 74.
Suggested message: "Symbol is treated both as valued and unvalued".

Macro: intMARPA_ERR_TERMINAL_IS_LOCKED

An attempt was made to change the terminal status
of a symbol to a different value
after it was locked.
Numeric value: 75.
Suggested message: "The terminal status of the symbol is locked".

Macro: intMARPA_ERR_TOKEN_IS_NOT_TERMINAL

A token was specified whose symbol ID is not
a terminal.
Numeric value: 76.
Suggested message: "Token symbol must be a terminal".

Macro: intMARPA_ERR_TOKEN_LENGTH_LE_ZERO

A token length was specified which is less than
or equal to zero.
Zero-length tokens are not allowed in Libmarpa.
Numeric value: 77.
Suggested message: "Token length must greater than zero".

Macro: intMARPA_ERR_TOKEN_TOO_LONG

The token length is too long.
The limit on the length of a token
is implementation dependent, but it
is at least 500,000,000 earlemes.
An application using a token that long
is almost certain to run into some other
limit.
Numeric value: 78.
Suggested message: "Token is too long".

Macro: intMARPA_ERR_TREE_EXHAUSTED

A Libmarpa parse tree iterator
is “exhausted”, that is,
it has no more parses.
Numeric value: 79.
Suggested message: "Tree iterator is exhausted".

Macro: intMARPA_ERR_TREE_PAUSED

A Libmarpa tree is “paused”
and an operation was attempted which
is inconsistent with that fact.
Typically, this operation will be
a call of the marpa_t_next() method.
Numeric value: 80.
Suggested message: "Tree iterator is paused".

Macro: intMARPA_ERR_UNEXPECTED_TOKEN_ID

An attempt was made to read a token
where a token with that symbol ID is not
expected.
This message can also occur when an
attempt is made to read a token
at a location where no token is expected.
Numeric value: 81.
Suggested message: "Unexpected token".

Macro: intMARPA_ERR_UNPRODUCTIVE_START

The start symbol is unproductive.
That means it could never match any possible input,
not even the null string.
Presumably, an error in writing the grammar.
Numeric value: 82.
Suggested message: "Unproductive start symbol".

Macro: intMARPA_ERR_VALUATOR_INACTIVE

The valuator is inactive in a context where that
should not be the case.
Numeric value: 83.
Suggested message: "Valuator inactive".

Macro: intMARPA_ERR_VALUED_IS_LOCKED

Unvalued symbols are a deprecated Marpa feature,
which may be avoided with
the marpa_g_force_valued method.
This error code
indicates that the valued status of a symbol is locked,
and an attempt was made
to change it to a status different from the
current one.
Numeric value: 84.
Suggested message: "The valued status of the symbol is locked".

Macro: intMARPA_ERR_SYMBOL_IS_NULLING

An attempt was made to do something with a nulling
symbol that is not allowed.
For example,
the ID of a nulling symbol cannot be an argument
to marpa_r_expected_symbol_event_set —
because it is not possible to create an “expected symbol” event
for a nulling symbol.
Numeric value: 87.
Suggested message: "Symbol is nulling".

Macro: intMARPA_ERR_SYMBOL_IS_UNUSED

An attempt was made to do something with an unused symbol that is not allowed.
An “unused” symbol is a inaccessible or unproductive symbol.
For example,
the ID of a unused symbol cannot be an argument
to marpa_r_expected_symbol_event_set —
because it is not possible to create an “expected symbol” event
for an unused symbol.
Numeric value: 88.
Suggested message: "Symbol is not used".

19.4 Internal error codes

An internal error code may be one of two things:
First,
it can be an error code which
arises from an internal Libmarpa programming issue
(in other words, something happening in the code
that was not supposed to be able to happen.)
Second, it can be an error code which only occurs
when a method from Libmarpa’s internal interface
is used.
Both kinds of internal error message share one common
trait — users of the Libmarpa’s external interface
should never see them.

Internal error messages
require someone with knowledge of the Libmarpa internals
to follow up on them.
They usually do not have descriptions or suggested messages.

Macro: intMARPA_ERR_AHFA_IX_NEGATIVE

Numeric value: 1.

Macro: intMARPA_ERR_AHFA_IX_OOB

Numeric value: 2.

Macro: intMARPA_ERR_ANDID_NEGATIVE

Numeric value: 3.

Macro: intMARPA_ERR_ANDID_NOT_IN_OR

Numeric value: 4.

Macro: intMARPA_ERR_ANDIX_NEGATIVE

Numeric value: 5.

Macro: intMARPA_ERR_BOCAGE_ITERATION_EXHAUSTED

Numeric value: 7.

Macro: intMARPA_ERR_DEVELOPMENT

"Development" errors were used heavily during
Libmarpa’s development,
when it was not yet clear how precisely
to classify every error condition.
Unless they are using a developer’s version,
users of the external interface
should never
see development errors.

Development errors have an error string
associated with them.
The error string is a
short 7-bit ASCII error string
which describes the error.
Numeric value: 9.
Suggested message: "Development error, see string".

Macro: intMARPA_ERR_DUPLICATE_AND_NODE

Numeric value: 10.

Macro: intMARPA_ERR_YIM_ID_INVALID

Numeric value: 14.

Macro: intMARPA_ERR_INTERNAL

A “catchall” internal error.
Numeric value: 19.

Macro: intMARPA_ERR_INVALID_AHFA_ID

The AHFA ID was invalid. There are no AHFAs
any more, so this message should not occur.
Numeric value: 20.

Macro: intMARPA_ERR_INVALID_AIMID

The AHM ID was invalid. The term “AIMID”
is a legacy of earlier implementations and must
be kept for backward compatibility.
Numeric value: 21.

Macro: intMARPA_ERR_INVALID_IRLID

Numeric value: 23.

Macro: intMARPA_ERR_INVALID_NSYID

Numeric value: 24.

Macro: intMARPA_ERR_NOOKID_NEGATIVE

Numeric value: 33.

Macro: intMARPA_ERR_NOT_TRACING_COMPLETION_LINKS

Numeric value: 35.

Macro: intMARPA_ERR_NOT_TRACING_LEO_LINKS

Numeric value: 36.

Macro: intMARPA_ERR_NOT_TRACING_TOKEN_LINKS

Numeric value: 37.

Macro: intMARPA_ERR_NO_AND_NODES

Numeric value: 38.

Macro: intMARPA_ERR_NO_OR_NODES

Numeric value: 40.

Macro: intMARPA_ERR_NO_TRACE_YS

Numeric value: 46.

Macro: intMARPA_ERR_NO_TRACE_PIM

Numeric value: 47.

Macro: intMARPA_ERR_NO_TRACE_YIM

Numeric value: 45.

Macro: intMARPA_ERR_NO_TRACE_SRCL

Numeric value: 48.

Macro: intMARPA_ERR_ORID_NEGATIVE

Numeric value: 51.

Macro: intMARPA_ERR_OR_ALREADY_ORDERED

Numeric value: 52.

Macro: intMARPA_ERR_PIM_IS_NOT_LIM

Numeric value: 55.

Macro: intMARPA_ERR_SOURCE_TYPE_IS_NONE

Numeric value: 70.

Macro: intMARPA_ERR_SOURCE_TYPE_IS_TOKEN

Numeric value: 71.

Macro: intMARPA_ERR_SOURCE_TYPE_IS_COMPLETION

Numeric value: 68.

Macro: intMARPA_ERR_SOURCE_TYPE_IS_LEO

Numeric value: 69.

Macro: intMARPA_ERR_SOURCE_TYPE_IS_AMBIGUOUS

Numeric value: 67.

Macro: intMARPA_ERR_SOURCE_TYPE_IS_UNKNOWN

Numeric value: 72.

20 Design notes

Design apologetics are
throughout this document,
dispersed among sections where it is
hoped that they motivate the description or discussion.
The notes in this section
did not find a home elsewhere.

20.1 Why so many time objects?

Marpa is an aggressively multi-pass algorithm.
Marpa achieves its efficiency,
not in spite of making multiple
passes over the data, but because of it.
Marpa regularly substitutes
two fast O(n) passes for a single
O(n log n) pass.
Marpa’s proliferation of time objects is in
keeping with its multi-pass approach.

Bocage objects come at no cost,
even for unambiguous parses,
because the same pass which creates the bocage
also deals with other issues which are of major
significance for unambiguous parses.
It is the post-processing of the bocage pass
that enables Marpa to do both left-
and right-recursion in linear time.

Of the various objects, the best
case for elimination is of the
ordering object.
In many cases, the ordering is trivial.
Either the parse is unambiguous, or the
application does not care about the order in
which parses are returned.
But while it would be easy to add an option
to bypass creation of an ordering object,
there is little to be gained from it.
When the ordering is trivial,
its overhead is very small —
essentially a handful of subroutine calls.
Many orderings accomplish nothing,
but these cost next to nothing.

Tree objects come at minimal cost to unambiguous grammars,
because the same pass that allows iteration through multiple
parse trees does the tree traversal.
This eliminates much of the work that otherwise would
need to be done in
the valuation time object.
In the current implement, the valuation time object
needs only to step through a sequence already determined
in the tree iterator.

20.2 Numbered objects

As the name suggests,
the choice was made to implement
numbered objects as integers,
and not as
pointers.
In standard-conformant C,
integers can be safely checked for validity,
while pointers cannot.

There are efficiency tradeoffs between pointers and
integers but they are complicated,
and they go both ways.
Pointers can be faster, but integers can be used
as indexes into more than one data structure.
Which is actually faster depends on the design.
Integers allow for a more flexible design,
so that once the choice is settled on,
careful programming can make them a win,
possibly a very big one.

The approach taken in Libmarpa was to settle,
from the outset,
on integers as the implementation for numbered
objects,
and to optimize on that basis.
The author concedes that it is
possible that
others redoing Libmarpa from scratch
might find that pointers are faster.
But the author is confident
that they will also discover,
on modern architectures,
that the lack of safe validity checking
is far too high a price to pay
for the difference in speed.

20.3 LHS terminals

Marpa’s idea
in losing the sharp division between terminals
and non-terminals is that the distinction,
while helpful for proving theorems,
is not essential in practice.
LHS symbols
in the input might be useful for
“short circuiting” the rules in which they occur.
This may
prove helpful in debugging, or have other applications.

However,
it also can be useful,
for checking input validity as well as for efficiency,
to follow tradition and distinguish
non-terminals from terminals.
For this reason,
the traditional behavior is the default
in Libmarpa.

21 Work in Progress

21.1 Untested methods

The methods of this section are not in the external interface,
because they have not been adequately tested.
Their fate is uncertain.
Users should regard these methods
as unsupported.

21.1.1 Ranking methods

These methods, respectively, set
and query the default rank of the grammar.
When a grammar is created, the default rank is 0.
When rules and symbols are created, their rank is
the default rank of the grammar.

Changing the grammar’s default rank does not affect
those
rules and symbols already created,
only those that will be created.
This means that
the grammar’s default rank can be used to,
in effect, assign ranks to groups of rules and symbols.
Applications may find this behavior useful.

Return value: On success, returns
the rank after
the call,
and sets the error code to
MARPA_ERR_NONE.
On failure, returns -2,
and sets the error code to an appropriate
value, which will never be
MARPA_ERR_NONE.
Note that when the rank is -2,
the error code is the only way to distinguish
success from failure.
The error code can be determined by using the
marpa_g_error() call.

These methods, respectively, set
and query the rank of a symbol sym_id.
When sym_id is created, its rank
initialized to the default rank of the grammar.

Return value: On success, returns
the rank after
the call,
and sets the error code to
MARPA_ERR_NONE.
On failure, returns -2,
and sets the error code to an appropriate
value, which will never be
MARPA_ERR_NONE.
Note that when the rank is -2,
the error code is the only way to distinguish
success from failure.
The error code can be determined by using the
marpa_g_error() call.

21.1.3 Methods for revising parses

Marpa allows an application to “change its mind” about a parse,
rejected rule previously recognized or predicted,
and terminals previously scanned.
The methods in this section provide that capability.

Function: Marpa_Earlememarpa_r_clean( Marpa_Recognizer r)

22 Deprecated techniques and methods

22.1 Valued and unvalued symbols

22.1.1 What unvalued symbols were

Libmarpa symbols can have values,
which is the traditional way of doing semantics.
Libmarpa also allows symbols to be unvalued.
An unvalued symbol is one whose value
is unpredictable from instance to instance.
If a symbol is unvalued, we sometimes say that it
has “whatever” semantics.

Situations where the semantics can tolerate unvalued symbols
are surprisingly frequent.
For example, the top-level of many languages is a series
of major units, all of whose semantics are typically accomplished
via side effects.
The compiler is typically indifferent to the actual value produced
by these major units, and tracking them is a waste of time.
Similarly, the value of the separators in a list is typically
ignored.

Rules are unvalued if and only if their LHS symbols
are unvalued.
When rules and symbols are unvalued,
Libmarpa optimizes their evaluation.

It is in principle unsafe to check the value
of a symbol if it can be unvalued.
For this reason,
once a symbol has been treated as valued,
Libmarpa marks it as valued.
Similarly,
once a symbol has been treated as unvalued,
Libmarpa marks it as unvalued.
Once marked, a symbol’s valued status is
locked and cannot be changed later.

The valued status of terminals is marked the first
time they are read.
The valued status of LHS symbols must be explicitly
marked by the application when initializing the
valuator — this is Libmarpa’s equivalent of
registering a callback.

LHS terminals are disabled by default.
If allowed, the user should be aware that the valued
status of a LHS terminal
will be locked in the recognizer
if it is used as a terminal,
and the symbol’s use as a rule LHS
in the valuator must be
consistent with the recognizer’s marking.

Marpa reports an error when a symbol’s use
conflicts with its locked valued status.
Doing so usually saves the programmer
some tricky debugging further down the road.

22.1.2 Grammar methods dealing with unvalued symbols

These methods, respectively, set
and query the “valued status” of a symbol.
Once set to a value with the
marpa_g_symbol_is_valued_set() method,
the valued status of a symbol is “locked” at that value.
It cannot thereafter be changed.
Subsequent calls to
marpa_g_symbol_is_valued_set()
for the same sym_id will fail,
leaving sym_id’s valued status unchanged,
unless value is the same as the locked-in value.

Return value: On success, 1 if the symbol symbol_id
is valued after the call, 0 if not.
If the valued status is locked and value
is different from the current status, -2.
If value is not 0 or 1;
or on other failure, -2.

22.1.3 Registering semantics in the valuator

By default, Libmarpa’s valuator objects
assume that
non-terminal symbols have
no semantics.
The archetypal application will need to register
symbols that contain semantics.
The primary method for doing this is
marpa_v_symbol_is_valued().
Applications will typically register semantics by rule,
and these applications will find
the marpa_v_rule_is_valued() method more convenient.

These methods, respectively,
set and query
the valued status of symbol sym_id.
marpa_v_symbol_is_valued_set() will set
the valued status to
the value of
its status argument.
A valued status of 1 indicates that the symbol is valued.
A valued status of 0 indicates that the symbol is unvalued.
If the valued status is locked,
an attempt to change to a status different from the
current one will fail
(error code MARPA_ERR_VALUED_IS_LOCKED).

Return value: On success, the valued status after
the call.
If value is not either 0 or 1,
or on other failure, -2.

These methods, respectively,
set and query
the valued status
for the LHS symbol of rule rule_id.
marpa_v_rule_is_valued_set() sets
the valued status to the value
of its status argument.

A valued status of 1 indicates that the symbol is valued.
A valued status of 0 indicates that the symbol is unvalued.
If the valued status is locked,
an attempt to change to a status different from the
current one will fail
(error code MARPA_ERR_VALUED_IS_LOCKED).

Rules have no valued status of their own.
The valued status of a rule
is always that of its LHS symbol.
These methods are conveniences — they
save the application the trouble of looking
up the rule’s LHS.

Return value: On success, the valued status of the
rule rule_id’s LHS symbol after
the call.
If value is not either 0 or 1,
or on other failure, -2.

Function: intmarpa_v_valued_force( Marpa_Value v)

This methods locks the valued status of all symbols
to 1, indicated that the symbol is valued.
If this is not possible, for example because one of
the grammar’s symbols already is locked at a valued
status of 0,
failure is returned.

Return value: On success, a non-negative number.
On failure, returns -2,
and sets the error code to an appropriate
value, which will never be
MARPA_ERR_NONE.