This part of the manual is a tutorial introduction to the Objective
Caml language. A good familiarity with programming in a conventional
languages (say, Pascal or C) is assumed, but no prior exposure to
functional languages is required. The present chapter introduces the
core language. Chapter 3 deals with the
object-oriented features, and chapter 2 with the
module system.

For this overview of Caml, we use the interactive system, which
is started by running ocaml from the Unix shell, or by launching the
OCamlwin.exe application under Windows. This tutorial is presented
as the transcript of a session with the interactive system:
lines starting with # represent user input; the system responses are
printed below, without a leading #.

Under the interactive system, the user types Caml phrases, terminated
by ;;, in response to the # prompt, and the system compiles them
on the fly, executes them, and prints the outcome of evaluation.
Phrases are either simple expressions, or let definitions of
identifiers (either values or functions).

The Caml system computes both the value and the type for
each phrase. Even function parameters need no explicit type declaration:
the system infers their types from their usage in the
function. Notice also that integers and floating-point numbers are
distinct types, with distinct operators: + and * operate on
integers, but +. and *. operate on floats.

#1.0 * 2;;
This expression has type float but is here used with type int

Predefined data structures include tuples, arrays, and lists. General
mechanisms for defining your own data structures are also provided.
They will be covered in more details later; for now, we concentrate on lists.
Lists are either given in extension as a bracketed list of
semicolon-separated elements, or built from the empty list []
(pronounce ``nil'') by adding elements in front using the ::
(``cons'') operator.

As with all other Caml data structures, lists do not need to be
explicitly allocated and deallocated from memory: all memory
management is entirely automatic in Caml. Similarly, there is no
explicit handling of pointers: the Caml compiler silently introduces
pointers where necessary.

As with most Caml data structures, inspecting and destructuring lists
is performed by pattern-matching. List patterns have the exact same
shape as list expressions, with identifier representing unspecified
parts of the list. As an example, here is insertion sort on a list:

The type inferred for sort, 'a list -> 'a list, means that sort
can actually apply to lists of any type, and returns a list of the
same type. The type 'a is a type variable, and stands for any
given type. The reason why sort can apply to lists of any type is
that the comparisons (=, <=, etc.) are polymorphic in Caml:
they operate between any two values of the same type. This makes
sort itself polymorphic over all list types.

The sort function above does not modify its input list: it builds
and returns a new list containing the same elements as the input list,
in ascending order. There is actually no way in Caml to modify
in-place a list once it is built: we say that lists are immutable
data structures. Most Caml data structures are immutable, but a few
(most notably arrays) are mutable, meaning that they can be
modified in-place at any time.

Caml is a functional language: functions in the full mathematical
sense are supported and can be passed around freely just as any other
piece of data. For instance, here is a deriv function that takes any
float function as argument and returns an approximation of its
derivative function:

Functions that take other functions as arguments are called
``functionals'', or ``higher-order functions''. Functionals are
especially useful to provide iterators or similar generic operations
over a data structure. For instance, the standard Caml library
provides a List.map functional that applies a given function to each
element of a list, and returns the list of the results:

The declaration of a variant type lists all possible shapes for values
of that type. Each case is identified by a name, called a constructor,
which serves both for constructing values of the variant type and
inspecting them by pattern-matching. Constructor names are capitalized
to distinguish them from variable names (which must start with a
lowercase letter). For instance, here is a variant
type for doing mixed arithmetic (integers and floats):

This definition reads as follow: a binary tree containing values of
type 'a (an arbitrary type) is either empty, or is a node containing
one value of type 'a and two subtrees containing also values of type
'a, that is, two 'a btree.

Operations on binary trees are naturally expressed as recursive functions
following the same structure as the type definition itself. For
instance, here are functions performing lookup and insertion in
ordered binary trees (elements increase from left to right):

Though all examples so far were written in purely applicative style,
Caml is also equipped with full imperative features. This includes the
usual while and for loops, as well as mutable data structures such
as arrays. Arrays are either given in extension between [| and |]
brackets, or allocated and initialized with the Array.create
function, then filled up later by assignments. For instance, the
function below sums two vectors (represented as float arrays) componentwise.

Caml has no built-in notion of variable -- identifiers whose current
value can be changed by assignment. (The let binding is not an
assignment, it introduces a new identifier with a new scope.)
However, the standard library provides references, which are mutable
indirection cells (or one-element arrays), with operators ! to fetch
the current contents of the reference and := to assign the contents.
Variables can then be emulated by let-binding a reference. For
instance, here is an in-place insertion sort over arrays:

References are also useful to write functions that maintain a current
state between two calls to the function. For instance, the following
pseudo-random number generator keeps the last returned number in a
reference:

In some special cases, you may need to store a polymorphic function in
a data structure, keeping its polymorphism. Without user-provided
type annotations, this is not allowed, as polymorphism is only
introduced on a global level. However, you can give explicitly
polymorphic types to record fields.

Caml provides exceptions for signalling and handling exceptional
conditions. Exceptions can also be used as a general-purpose non-local
control structure. Exceptions are declared with the exception
construct, and signalled with the raise operator. For instance, the
function below for taking the head of a list uses an exception to
signal the case where an empty list is given.

Exceptions are used throughout the standard library to signal cases
where the library functions cannot complete normally. For instance,
the List.assoc function, which returns the data associated with a
given key in a list of (key, data) pairs, raises the predefined
exception Not_found when the key does not appear in the list:

The with part is actually a regular pattern-matching on the
exception value. Thus, several exceptions can be caught by one
try...with construct. Also, finalization can be performed by
trapping all exceptions, performing the finalization, then raising
again the exception:

We finish this introduction with a more complete example
representative of the use of Caml for symbolic processing: formal
manipulations of arithmetic expressions containing variables. The
following variant type describes the expressions we shall manipulate:

As shown in the examples above, the internal representation (also
called abstract syntax) of expressions quickly becomes hard to
read and write as the expressions get larger. We need a printer and a
parser to go back and forth between the abstract syntax and the concrete syntax, which in the case of expressions is the familiar
algebraic notation (e.g. 2*x+1).

For the printing function, we take into account the usual precedence
rules (i.e. * binds tighter than +) to avoid printing unnecessary
parentheses. To this end, we maintain the current operator precedence
and print parentheses around an operator only if its precedence is
less than the current precedence.

Parsing (transforming concrete syntax into abstract syntax) is usually
more delicate. Caml offers several tools to help write parsers:
on the one hand, Caml versions of the lexer generator Lex and the
parser generator Yacc (see chapter 12), which handle
LALR(1) languages using push-down automata; on the other hand, a
predefined type of streams (of characters or tokens) and
pattern-matching over streams, which facilitate the writing of
recursive-descent parsers for LL(1) languages. An example using
ocamllex and ocamlyacc is given in
chapter 12. Here, we will use stream parsers.
The syntactic support for stream parsers is provided by the Camlp4
preprocessor, which can be loaded into the interactive toplevel via
the #load directive below.

For the lexical analysis phase (transformation of the input text into
a stream of tokens), we use a ``generic'' lexer provided in the
standard library module Genlex. The make_lexer function takes a
list of keywords and returns a lexing function that ``tokenizes'' an
input stream of characters. Tokens are either identifiers, keywords,
or literals (integer, floats, characters, strings). Whitespace and
comments are skipped.

The parser itself operates by pattern-matching on the stream of
tokens. As usual with recursive descent parsers, we use several
intermediate parsing functions to reflect the precedence and
associativity of operators. Pattern-matching over streams is more
powerful than on regular data structures, as it allows recursive calls
to parsing functions inside the patterns, for matching sub-components of
the input stream. See the Camlp4 documentation for more details.

Answer: the generic lexer provided by Genlex recognizes negative
integer literals as one integer token. Hence, x-1 is read as
the token Ident "x" followed by the token Int(-1); this sequence
does not match any of the parser rules. On the other hand,
the second space in x - 1 causes the lexer to return the three
expected tokens: Ident "x", then Kwd "-", then Int(1).

All examples given so far were executed under the interactive system.
Caml code can also be compiled separately and executed
non-interactively using the batch compilers ocamlc or ocamlopt.
The source code must be put in a file with extension .ml. It
consists of a sequence of phrases, which will be evaluated at runtime
in their order of appearance in the source file. Unlike in interactive
mode, types and values are not printed automatically; the program must
call printing functions explicitly to produce some output. Here is a
sample standalone program to print Fibonacci numbers:

Sys.argv is an array of strings containing the command-line
parameters. Sys.argv.(1) is thus the first command-line parameter.
The program above is compiled and executed with the following shell
commands: