Message processing is the extraction of information about key events
described in brief narratives concerning a narrow domain. This is a
suitable task for natural language
understanding, since the amount of world knowledge required is
limited. However, the messages are often ill-formed and therefore
require the grammar which parses them to be quite forgiving. This
often results in a proliferation of parses. This problem is compounded
by one's inability to construct a complete domain model which
would resolve all the semantic ambiguity. Thus, selection of the
correct parse becomes an important goal for such systems.

Structural preference is a technique which helps disambiguation
by assigning a higher preference to certain syntactic structures.
The idea of statistical parsing evolved from the desire of being able to
prefer certain structures over others on the basis of empirical
observations, rather than ad-hoc judgement. In the framework of
statistical parsing, every production of the grammar is assigned a
priority, which is computed from a statistical analysis of a corpus.

There are two distinct methodologies that can be used for assigning
these priorities. In Supervised Training, only the
correct parses are used for training the grammar. On the
other hand, Unsupervised Training uses parses independent of
their semantic validity. After assigning the priorities,
the parser
searches for parses in a best-first order as dictated by these
priorities.

When this scheme was incorporated into the PROTEUS message
understanding system while processing OPREP
(U.S. Navy Operational) messages, a two-fold advantage was observed.
Firstly, the speed of the parsing increased, because
rare productions tended not to get used at all.
Secondly, since the parses were generated in the best-first order, the
parses generated earlier on tended to be more likely and semantically more acceptable.

The performance of the modified parsing algorithm was evaluated with
and without several refinements such as the use of context sensitive
statistics and the use of heuristic penalties. The relative
performances of the grammars trained by Supervised Training and
Unsupervised Training were also compared.