This is not a reference manual for the Standard ML language.
If you need a reference manual or a tutorial, you can find
several sources of information, both on-line and in hard copy
from the Introduction.

Interacting with SML/NJ

When you start the SML/NJ system, it loads and responds with a
message giving the current version number and then a prompt for
user input. The prompt is a single dash ("-").

When prompted, you can type in a top-level declaration.
There are several kinds of top-level declarations in SML. For
example, the following is declaration of a function calledinc
that increments its integer argument. (In these examples, the
dash ("-") is the SML/NJ
prompt, and the text in teletype font is the user
input. In some browsers, user input will also appear in blue
text. The italic font is used for the output from the
SML/NJ system. The symbol represents a carriage return on Unix-based systems
or the Enter key on the PC and Macintosh systems.)

- fun inc x = x + 1; val inc = fn : int -> int

The text "fun inc x = x + 1"
is the declaration for the inc
function. The semicolon (";")
is a marker that indicates to the SML/NJ system that it should
perform the following actions: elaborate (that is,
perform typechecking and other static analyses), compile (to
obtain executable machine code), execute, and finally print
the result of this declaration. After all of this, the
system then prompts for new input and the whole process starts
again. This is the so-called "top-level loop". To exit
from the SML/NJ system, simply type an end-of-file character
(Control-d) to the prompt.

In the example above, the printed result shows that inc is
a function that takes an integer argument and yields an integer
result. Actually, it is important for you to know that, in SML,
functions are "first-class" values, fundamentally no
different than other values such as integers. So, to be more
precise, it is better to say that the identifier inc has been bound to
a value (which happens to be a function, as denoted by the fn
keyword above) of type int -> int.

If we had left out the semicolon, then the elaboration,
compilation, execution, and printing would have been deferred and
a prompt (this time, an equal sign, "=")
would be given, for either a continuation of the declaration of inc or else another top-level
declaration. When a semicolon is finally entered (perhaps after
several more top-level declarations), all of the declarations
since the last semi-colon would be processed in sequence. For
example:

In this example, we have defined the inc
function as well as a function f
that uses inc.

In the interactive top-level loop, the simplest form of input
is an expression. For example, after typing in the declarations
for inc and f above, we can now call f by typing in:

- f (2+4); val it = 35 : int

Notice that since no identifier is given to bind to the value,
the interactive system has chosen the identifier it and
bound it to the result of compiling and executing the expression f (2+4).

You might have experience with other languages whose
implementations support a similar kind of interactive top-level
loop. For example, most implementations of the Lisp, Scheme, and
Basic languages support top-level loops. If you have experience
with any of these languages, then you might expect that
re-defining a function will change the binding of the function
name, as well as any other functions that call that function.
However, in the SML/NJ system, this is not the case. For example,
suppose we wish to change the definition of the inc function, so that it
increments by two instead of one:

- fun inc x = x + 2; val inc = fn : int -> int

In typical Lisp and Scheme systems, such a re-definition would
cause the function f to
change as well, since f
calls inc. But in the
SML/NJ system, f's
binding does not change, so in fact referring to f now still yields the
original function:

- f (2+4); val it = 35 : int

To understand why the SML/NJ system behaves in this way,
consider what would happen if we re-defined inc
so that it had a type different than int -> int, for
example:

- fun inc x = (x mod 2 = 0); val inc = fn : int -> bool

Here, inc has been
changed to a function that returns true
if and only if its integer argument is even. Now, if f should also be changed to
reflect this re-definition (as it would be in Lisp and Scheme
systems), it would fail to typecheck. This is not necessarily a
bad thing, but at any rate the SML/NJ system does not bother to
go back to earlier top-level declarations and re-elaborate them;
hence, f's binding is
left unchanged.

If you are already familiar with the SML language, then you
can think of the sequence of top-level declarations typed into an
SML/NJ interactive top-level loop as being in nested
let-bindings:

Using Files and the Standard Basis

Instead of typing your program into the interactive top-level,
it is more productive to put your program into a file (or set of
files) and then load it (them) into the SML/NJ system. The
simplest way to do this is to use the built-in function use. For example:

- use "myprog.sml"; [opening myprog.sml]
...
val it = () : unit

The use function
takes the name of the file (of type string) to load. If
the file exists, it is opened and read, with each top-level
declaration in the file processed in turn (and the results
printed on the standard output). The "result" of the use function is the unit value
("()").

As your programs get larger and the code becomes spread over
many modules, you can find it extremely difficult to remember
exactly the right order in which to "use"
the files. In order to alleviate this problem, the SML/NJ system
has a built-in feature called the Compilation Manager, or simply
CM, which I highly recommend that you use. (Actually, you might
have to start the SML/NJ system by invoking the
"sml-cm" binary, instead of simply "sml".) CM
is a complex system with documentation available on-line at http://www.cs.princeton.edu/~blume/cm-manual.ps.
For most uses the simplest interface is sufficient: simply create
a file in the current directory called sources.cm
which contains the names of all of your SML source files, listed
one per line in any order. Once this file is created, then you
can use the function CM.make
to load, compile, and execute your system. For example, suppose
you have three source files, a.sig,
b.sml, and c.sml. Then you can create a
file called sources.cm
with the following contents:

Group is
a.sig
b.sml
c.sml

Note that it does not matter in what order the file names
occur. Once this file has been created, typing the following to
the SML/NJ system will do whatever is necessary in order to load
your program:

- CM.make();

The CM.make function
will scan all of your sources files and calculate the
dependencies among them so as to compile and load them in the
right order. If CM.make
has already been used before to compile and load your program,
then it looks to see what files have been changed since the last
"make", and then loads and compiles the minimal number
of files necessary in order to bring the system up-to-date. After
running CM.make, you
might notice a new directory in your source file directory. This
new directory is used by CM to "remember" the results
of the dependency calculation, as well as to store the results of
compiling your files so that they don't have to be compiled again
(unless, of course, they have been changed).

There is an extensive set of pre-defined values and functions
in the SML/NJ system. This is referred to as the standard
basis, or sometimes the pervasive environment. As
with CM, there is also extensive documentation available on-line
for the standard basis at http://cm.bell-labs.com/cm/cs/what/smlnj/basis/index.html.
(A book on the standard basis will be published soon.) For
dealing with files, the following function is often useful:

OS.FileSys.chDir : string -> unit

This function implements the standard "cd" Unix
command, which changes the current working directory to the
directory specified in the string argument. This is useful if you
have started the SML/NJ system in a directory different from the
one containing your source files.

Another set of basis functions are useful for controlling the
output produced by the SML/NJ system:

These variables control the maximum depth and length to which
lists, tuples, and other data structures are to be printed. When
a data structure is deeper than printDepth
or longer than printLength,
the remaining portion of the structure is printed as an ellipse
("...").

To change the value of one of these variables, an assignment
can be used. For example:

- Compiler.Control.Print.printDepth := 10;

changes the maximum print depth to ten.

The standard basis contains many modules and functions for
manipulating values of all of the basic types, including
booleans, integers, reals, characters, strings, arrays, and
lists. Unfortunately, the SML/NJ system does not provide any kind
of browser, so either you need to refer to the written
documentation for the standard basis, or use a little bit of
a hack in order to see the complete set of basis functions
currently supplied in the SML/NJ for these types. For example,
type the following to the interactive top-level:

- signature S = INTEGER;

Each set of standard basis functions is encapsulated in an SML
module, and each such module has a signature, or
"interface", whose name is written entirely in
uppercase and refers to the type of values for which the module
provides functionality. (Note that SML is case sensitive.) For
the integer functions, the signature is called INTEGER. So, the above
declaration simply binds the identifier S
to the signature INTEGER, which causes the
SML/NJ system to respond with a listing of the entire INTEGER interface. (We
could have used any name besides S.)
Other useful signatures include BOOL, REAL, CHAR, STRING, ARRAY, and LIST. For functions that
interface to the operating system (such as OS.FileSys.chDir
above), see the signature OS (and POSIX, if provided). There
are many many other useful modules in the standard basis as well.

Editing Files Using Emacs

I recommend using Emacs to edit your SML programs and also to
manage interaction with the SML/NJ system. To do this, you should
incorporate the "sml mode" into your emacs startup
file. The relevant emacs lisp files can be found in the same
directory tree as the SML/NJ system itself. For example, from
Unix machines in the Computer Science Department, you can simply
add the line

(load "/usr/local/lib/sml/sml-mode/sml-site")

to your .emacs file
so that the next time you start Emacs, the sml mode will be
present. From the Andrew network, you can find the emacs lisp files in the
15-411 course directory.

With the sml mode, a special editing mode will be invoked any
time you edit a file with an appropriate extension (such as
".sml"; other
extensions can be specified in the init.el
file). As in other special editing modes, using the Tab key or
Control-j will cause emacs to attempt to indent your code in a
pleasing way. Control-c followed by Tab will indent the current
region. Since SML's syntax is rather complex, the sml mode
indentation can be rather haphazard at times. Still, many people
find it to be quite useful. A particularly useful key combination
is "Meta" along with a vertical bar ("|");
this creates a template for an arm of a case expression or clause
of a function.

To run SML/NJ from Emacs, make sure that the emacs variable sml-program-name is set to
"sml" (which
is the default), and then type M-x sml
(that is, "Meta" along with "x",
followed by "sml").
This will start up the SML/NJ system as an inferior shell
process. There are several useful emacs commands for interacting
with the inferior sml shell. You can find documentation for them
by hitting Control-h m. Some of the most basic commands are

C-cC-l

save the current buffer and then "use" the
file

C-cC-r

send the current region to the sml shell

C-c`

find the next error message and position the cursor
on the corresponding line in the source file

Making Sense of Error Messages

As with most compilers, the SML/NJ system oftens produce error
messages that can be hard to decipher. The problem is compounded
by the fact that SML supports polymorphic type inference, which
makes it very difficult for the compiler to figure out precisely
the real source of a type error. On the other hand, once all of
the compile-time type errors are removed, it is often the case
that the bulk of the bugs have already been stamped out. In
practice, SML programs often work the first time, once all of the
type errors reported by the compiler have been removed!

Type mismatches

The most common kind of error is the simple type mismatch. For
example, suppose we have the following code in a file called myprog.sml:

fun inc x = x + 1
fun f n = inc true

Notice that a semi-colon is not needed here, since the
end-of-file marker will serve the same purpose. Now, if we load
this file, we get the following error message:

The error message indicates that the expression inc true, on line 2, between
columns 11 and 18, is guilty of a type mismatch. The function inc is being applied to an
argument of type bool in
this expression, but its domain (argument type) is int.

If we are using the sml mode in Emacs, then typing C-c C-l in an edit buffer
containing the program would cause the SML/NJ system to load the
file, and then typing C-c `
would move the edit cursor to the exact point in the program
corresponding to this error message.

Unresolved overloading

Some of the arithmetic operators, such as +,
*, -,
= , and so on, are
"overloaded", in the sense that they can be used with
either integer arguments or real arguments. This overloading
feature leads to possible source of confusion for the novice SML
programmer. Consider, for example, the following declaration of a
function for squaring numbers:

Because there is not enough information in this program to
determine whether the *
is for integers or for reals, an error message is generated to
complain about the inability to "resolve" the
overloading.

The simple fix for this kind of error is simply to declare the
type of one of the arguments to (or the result of) the arithmetic
operation. For example, here are three versions that work:

The first version explicitly declares the type of the second
argument to the *
operator. The second version declares the type of the argument.
Finally, the third version declares the type of the result of the
square''' function. All
three versions allow the SML type inference mechanism to infer
the types of the identifiers in the declarations.

It is not uncommon to spend quite a long time tracking down
the source of a type error. (Actually, the time spent doing this
is almost always much less than the time it takes to track down
the same error without the benefit of static typechecking!) A
common way to narrow down the possibilities, and also to improve
the precision of the error messages produced by the compiler, is
to annotate the program with explicit types, in the way that we
have done above. It is particularly helpful to annotate the types
of function parameters, as we have done in square''
above. This is similar to the declaration of parameter types in
languages such as C and Pascal. Of course, in those languages the
declarations are required; in SML they are optional.

The value restriction

One of the most fundamental changes in the 1997 revision of
the SML language is that it now enforces something called the value
restriction. Essentially, this restricts polymorphism to
expressions that clearly are values, specifically single
identifiers and functions. When this restriction is violated, the
error message, "nongeneric type variable," is given.
For example, the following program results in this error:

which indicates that the expression map
id is polymorphic, but not syntactically a value
(that is, not an identifier or lambda expression), and hence the
attempt to use it as a polymorphic value (by binding f to it) violates the value
restriction. The reasons for this restriction are beyond the
scope of this document, but are explained in several papers as
well as the textbook by Paulson.

Syntax errors

Because the syntax of SML is rather complex, there are several
common errors that novices tend to make. One of the most common
has to do with the syntax of patterns in clausal-form function
declarations and case expressions. Consider the following code:

The problem here is that Leaf and Node are patterns that are
syntactically separate from, respectively, the (v) and (l,r)
patterns. The (admittedly strange) syntax of SML requires extra
parenthesization:

This is true in all contexts where patterns are used,
including clausal-form function declarations, case expressions,
and exception handlers.

Another rather confusing part of the syntax has to do with the
interaction between case expressions, exception handlers, and
clausal-form function declarations. Consider the following
function, taken in slightly modified form from the SML/NJ library
(which is described later):

In this example, the local function filterP
is defined in two clauses, the first handling the case of a
non-empty list argument, and the second handling the empty list.
In the first clause, a case expression is used. The syntactic
ambiguity arises from the fact that it takes too much
``lookahead'' to figure out whether or not the second clause of filterP is actually the third
arm of the case expression. This leads to the following rather
cryptic error message:

Exporting Heaps

The SML language encourages modularity, and in practice
separate modules tend to be placed into separate files. While
this is useful during development, it becomes highly inconvenient
when you finally "ship" your finished program to your
users. The standard way to ship a program, then, is to save an
image of the system heap after all of your files have been
loaded. This is referred to as "exporting" the heap,
and results in a single file that contains the state of your SML
world at the time you performed the export operation.

You can export a heap with the function exportML.
For example, to save the heap image in a file called mysml, the following should be
typed to the SML/NJ prompt:

- SMLofNJ.exportML "mysml";

This will save the current state of the SML/NJ system into the
file mysml. This can
then be executed later by running the sml system with the
command-line option, "@SMLload=mysml".
This will restart the SML/NJ system at the same point in which
the exportML took place.
(Note that exportML is
not supported for the Macintosh System 7 version.)

There is also a function called exportFn,
which saves an SML state as a function that takes in the shell
command-line arguments when restarted. The functionality of exportFn is

The first argument is the name of the file to contain the
exported heap image. The second argument is a function that takes
the command line and command line arguments (as strings) and
returns a process-status value (usually OS_Process.success
or OS_Process.failure).

Tools

In addition to the standard basis, the SML/NJ system comes
with several tools and libraries. The ml-lex and ml-yacc programs
perform automatic generation of lexical analyzers and LALR(1)
parsers, respectively. Documentation for these
and other useful tools can be found at the SML/NJ documentation page.