Parsing in the SML Kit

Nick Rothwell

Introduction: This document describes the parsing scheme
of the SML Kit. We introduce the grammar datatype, or abstract
syntax, and the grammar specification itself, explaining the
conventions used. There are various mechanisms employed to resolve
the ambiguities and parsing difficulties of SML, and we describe
these together with the mechanisms which deal with infix operators,
derived forms, and so on.

We use the MLLex and MLYacc packages. Both have
been modified to produce portable code running under the SML Library, rather than
SML/NJ-specific code, so that the resulting parser can be compiled
under other SML systems. In fact, these versions of MLLex
and MLYacc are themselves portable, so that all Kit
development work can take place using a different SML compiler.

The generated lexing and parsing functors are supported by a
number of utility functors for dealing with infixes, derived-form
elimination, and so on. There are also utility functors for lexing,
providing positional information for error messages, and so on. All
these components are gathered together in the functor
Parse.

Section 2 describes the abstract syntax and its
parameterisation. Section 3 covers lexing issues, and Section 4
overviews the general structure of the parser. We cover the grammar
specification in Section 5, and the functor structure and linkage
in Section 6. The constituent files of the parsing system are
described in Section 7.

At this stage we should probably mention the main thing that the
parsing phase does not do. The Definition states a number of
syntactic restrictions (Section 2.9), mainly concerned with
repeated identifiers of various kinds. In fact, some of these
restrictions are not syntactic since they are impossible to
enforce at parse time, and require elaboration. Specifically,
repeated identifiers in value bindings and patterns cannot be
detected at parse time since it is legal to repeat value and
exception constructors, and these cannot be determined before
elaboration. We have chosen to enforce none of the rules
about repeated identifiers, for the sake of simplicity; it is
cleaner to deal with these errors during elaboration. The one
remaining syntactic restriction involving val rec is truly
syntactic and can be dealt with in the grammar.