Chapter 2. Variables and Functions

Variables and functions are fundamental ideas that show up in virtually all programming
languages. OCaml has a different take on these concepts than most languages you're likely to
have encountered, so this chapter will cover OCaml's approach to variables and functions in some
detail, starting with the basics of how to define a variable, and ending with the intricacies of
functions with labeled and optional arguments.

Don't be discouraged if you find yourself overwhelmed by some of the details, especially
toward the end of the chapter. The concepts here are important, but if they don't connect for
you on your first read, you should return to this chapter after you've gotten a better sense for
the rest of the language.

Variables

At its simplest, a variable is an identifier whose meaning is bound
to a particular value. In OCaml these bindings are often introduced using
the let keyword. We can type a
so-called top-levellet binding with the following syntax. Note that
variable names must start with a lowercase letter or an
underscore:

Every variable binding has a scope, which is
the portion of the code that can refer to that binding. When using
utop, the scope of a top-level
let binding is everything that follows it in the
session. When it shows up in a module, the scope is the remainder of that
module.

This time, in the inner scope we called the list of strings languages instead of language_list, thus hiding the original
definition of languages. But once the
definition of dashed_languages is
complete, the inner scope has closed and the original definition of
languages reappears:

It's important not to confuse a sequence of let
bindings with the modification of a mutable variable. For example,
consider how area_of_ring would work if
we had instead written this purposefully confusing bit of code:

Here, we redefined pi to be zero after the definition
of area_of_circle. You might think that this would mean
that the result of the computation would now be zero, but in fact, the behavior of the
function is unchanged. That's because the original definition of pi wasn't changed; it was just shadowed, which means that any subsequent
reference to pi would see the new definition of pi as 0, but earlier references would be
unchanged. But there is no later use of pi, so the binding
of pi to 0. made no
difference. This explains the warning produced by the toplevel telling us that there is an
unused definition of pi.

In OCaml, let bindings are immutable. There are
many kinds of mutable values in OCaml, which we'll discuss in Chapter 8, Imperative Programming, but there are no mutable
variables.

Why Don't Variables Vary?

One source of confusion for people new to OCaml is the fact that
variables are immutable. This seems pretty surprising even on linguistic
terms. Isn't the whole point of a variable that it can vary?

The answer to this is that variables in OCaml (and generally in functional languages)
are really more like variables in an equation than a variable in an imperative language. If
you think about the mathematical identity x(y + z) = xy +
xz, there's no notion of mutating the variables x, y, and z. They vary in the sense that you can instantiate this equation with different
numbers for those variables, and it still holds.

The same is true in a functional language. A function can be
applied to different inputs, and thus its variables will take on
different values, even without mutation.

Pattern Matching and let

Another useful feature of let bindings is that
they support the use of patterns on the lefthand
side. Consider the following code, which uses List.unzip, a function for converting a list
of pairs into a pair of lists:

Here, (ints,strings) is a
pattern, and the let binding assigns
values to both of the identifiers that show up in that pattern. A
pattern is essentially a description of the shape of a data structure,
where some components are identifiers to be bound. As we saw in the section called “Tuples, Lists, Options, and Pattern Matching”, OCaml has
patterns for a variety of different data types.

Using a pattern in a let binding makes the most sense for a pattern
that is irrefutable, i.e., where any value of the
type in question is guaranteed to match the pattern. Tuple and record patterns are
irrefutable, but list patterns are not. Consider the following code that implements a
function for upper casing the first element of a comma-separated list:

This case can't really come up in practice, because String.split always returns a list with at
least one element. But the compiler doesn't know this, and so it emits
the warning. It's generally better to use a match
statement to handle such cases explicitly:

Note that this is our first use of assert, which is useful for marking cases that
should be impossible. We'll discuss assert in more detail in Chapter 7, Error Handling.

Functions

Given that OCaml is a functional language, it's no surprise that
functions are important and pervasive. Indeed, functions have come up in
almost every example we've done so far. This section will go into more
depth, explaining the details of how OCaml's functions work. As you'll
see, functions in OCaml differ in a variety of ways from what you'll find
in most mainstream languages.

Anonymous Functions

We'll start by looking at the most basic style of function declaration in OCaml: the
anonymous function. An anonymous function is a function that is
declared without being named. These can be declared using the fun keyword, as shown here:

It's worth stopping for a moment to puzzle this example out, since this kind of
higher-order use of functions can be a bit obscure at first. Notice that (fun g -> g 5) is a function that takes a function as an
argument, and then applies that function to the number 5.
The invocation of List.map applies (fun g -> g 5) to the elements of the increments list (which are themselves functions) and returns the
list containing the results of these function applications.

The key thing to understand is that functions are ordinary values
in OCaml, and you can do everything with them that you'd do with an
ordinary value, including passing them to and returning them from other
functions and storing them in data structures. We even name functions in
the same way that we name other values, by using a
let binding:

This is the most common and convenient way to declare a function, but syntactic niceties
aside, the two styles of function definition are equivalent.

let and fun

Functions and let bindings have a lot to do
with each other. In some sense, you can think of the parameter of a
function as a variable being bound to the value passed by the caller.
Indeed, the following two expressions are nearly equivalent:

This rewrite makes it explicit that abs_diff is actually a function of one
argument that returns another function of one argument, which itself
returns the final result. Because the functions are nested, the inner
expression abs (x - y) has access to
both x, which was bound by the outer
function application, and y, which
was bound by the inner one.

This style of function is called a curried
function. (Currying is named after Haskell Curry, a logician who had a
significant impact on the design and theory of programming languages.)
The key to interpreting the type signature of a curried function is the
observation that -> is
right-associative. The type signature of abs_diff can therefore be parenthesized as
follows:

The parentheses don't change the meaning of the signature, but
they make it easier to see the currying.

Currying is more than just a theoretical curiosity. You can make
use of currying to specialize a function by feeding in some of the
arguments. Here's an example where we create a specialized version of
abs_diff that measures the distance
of a given number from 3:

You might worry that curried functions are terribly expensive, but
this is not the case. In OCaml, there is no penalty for calling a
curried function with all of its arguments. (Partial application,
unsurprisingly, does have a small extra cost.)

Currying is not the only way of writing a multiargument function
in OCaml. It's also possible to use the different parts of a tuple as
different arguments. So, we could write:

OCaml handles this calling convention efficiently as well. In
particular it does not generally have to allocate a tuple just for the
purpose of sending arguments to a tuple-style function. You can't,
however, use partial application for this style of function.

There are small trade-offs between these two approaches, but most
of the time, one should stick to currying, since it's the default style
in the OCaml world.

Recursive Functions

A function is recursive if it refers to
itself in its definition. Recursion is important in any programming
language, but is particularly important in functional languages, because
it is the way that you build looping constructs. (As will be discussed
in more detail in Chapter 8, Imperative Programming, OCaml
also supports imperative looping constructs like for and while, but these are only useful when using
OCaml's imperative features.)

In order to define a recursive function, you need to mark the
let binding as recursive with the rec keyword, as shown in this function for
finding the first sequentially repeated element in a list:

Note that in the code, the pattern | [] |
[_] is what's called an or-pattern, which
is a disjunction of two patterns, meaning that it will be considered a
match if either pattern matches. In this case, [] matches the empty list, and [_] matches any single element list. The
_ is there so we don't have to put an
explicit name on that single element.

We can also define multiple mutually recursive values by using
let rec combined with the and keyword. Here's a (gratuitously
inefficient) example:

OCaml distinguishes between nonrecursive definitions (using
let) and recursive definitions (using
let rec) largely for technical
reasons: the type-inference algorithm needs to know when a set of
function definitions are mutually recursive, and for reasons that don't
apply to a pure language like Haskell, these have to be marked
explicitly by the programmer.

But this decision has some good effects. For one thing, recursive
(and especially mutually recursive) definitions are harder to reason
about than nonrecursive ones. It's therefore useful that, in the absence
of an explicit rec, you can assume
that a let binding is nonrecursive, and so can only
build upon previous bindings.

In addition, having a nonrecursive form makes it easier to create
a new definition that extends and supersedes an existing one by
shadowing it.

Prefix and Infix Operators

So far, we've seen examples of functions used in both prefix and
infix style:

You might not have thought of the second example as an ordinary
function, but it very much is. Infix operators like + really only differ syntactically from other
functions. In fact, if we put parentheses around an infix operator, you
can use it as an ordinary prefix function:

In the second expression, we've partially applied (+) to create a function that increments its
single argument by 3.

A function is treated syntactically as an operator if the name of
that function is chosen from one of a specialized set of identifiers.
This set includes identifiers that are sequences of characters from the
following set:

The syntactic role of an operator is typically determined by its
first character or two, though there are a few exceptions. Table 2.1, “Precedence and associativity” breaks the different operators and other syntactic
forms into groups from highest to lowest precedence, explaining how each
behaves syntactically. We write !...
to indicate the class of operators beginning with !.

Table 2.1. Precedence and associativity

Operator prefix

Associativity

!..., ?..., ~...

Prefix

., .(, .[

-

function application, constructor, assert, lazy

Left associative

-, -.

Prefix

**..., lsl, lsr, asr

Right associative

*..., /..., %..., mod, land, lor, lxor

Left associative

+..., -...

Left associative

::

Right associative

@..., ^...

Right associative

=..., <..., >..., |..., &..., $...

Left associative

&, &&

Right associative

or, ||

Right associative

,

-

<-, :=

Right associative

if

-

;

Right associative

There's one important special case: - and -.,
which are the integer and floating-point subtraction operators, and can
act as both prefix operators (for negation) and infix operators (for
subtraction). So, both -x and
x - y are meaningful expressions.
Another thing to remember about negation is that it has lower precedence
than function application, which means that if you want to pass a
negative value, you need to wrap it in parentheses, as you can see in
this code:

# Int.max 3(-4);;

- : int = 3

# Int.max 3-4;;

Characters -1-9:
Error: This expression has type int -> int
but an expression was expected of type int

It's not quite obvious at first what the purpose of this operator
is: it just takes a value and a function and applies the function to the
value. Despite that bland-sounding description, it has the useful role
of a sequencing operator, similar in spirit to using the pipe character
in the UNIX shell. Consider, for example, the following code for
printing out the unique elements of your PATH. Note that List.dedup that follows removes duplicates
from a list by sorting the list using the provided comparison
function:

An important part of what's happening here is partial application.
For example, List.iter normally takes
two arguments: a function to be called on each element of the list, and
the list to iterate over. We can call List.iter with all its arguments:

The type error is a little bewildering at first glance. What's
going on is that, because ^> is
right associative, the operator is trying to feed the value List.dedup ~compare:String.compare to the
function List.iter ~f:print_endline.
But List.iter ~f:print_endline
expects a list of strings as its input, not a function.

The type error aside, this example highlights the importance of
choosing the operator you use with care, particularly with respect to
associativity.

Declaring Functions with Function

Another way to define a function is using the function keyword. Instead of having syntactic
support for declaring multiargument (curried) functions, function has built-in pattern matching. Here's
an example:

Also, note the use of partial application to generate the function
passed to List.map. In other words,
some_or_default 100 is a function
that was created by feeding just the first argument to some_or_default.

Labeled Arguments

Up until now, the functions we've defined have specified their arguments positionally,
i.e., by the order in which the arguments are passed to the function.
OCaml also supports labeled arguments, which let you identify a function argument by name.
Indeed, we've already encountered functions from Core like List.map that use labeled arguments. Labeled arguments are marked by a leading
tilde, and a label (followed by a colon) is put in front of the variable to be labeled.
Here's an example:

OCaml also supports label punning, meaning
that you get to drop the text after the : if the name of the label and the name of the
variable being used are the same. We were actually already using label
punning when defining ratio. The
following shows how punning can be used when invoking a
function:

When defining a function with lots of arguments. Beyond a
certain number, arguments are easier to remember by name than by
position.

When the meaning of a particular argument is unclear from the
type alone. Consider a function for creating a hash table whose
first argument is the initial size of the array backing the hash
table, and the second is a Boolean flag, which indicates whether
that array will ever shrink when elements are removed:

Choosing label names well is especially important for Boolean
values, since it's often easy to get confused about whether a value
being true is meant to enable or disable a given feature.

When defining functions that have multiple arguments that
might get confused with each other. This is most at issue when the
arguments are of the same type. For example, consider this signature
for a function that extracts a substring:

This improves the readability of both the signature and of
client code that makes use of substring and makes it harder to
accidentally swap the position and the length.

When you want flexibility on the order in which arguments are
passed. Consider a function like List.iter, which takes two arguments: a
function and a list of elements to call that function on. A common
pattern is to partially apply List.iter by giving it just the function,
as in the following example from earlier in the chapter:

This requires that we put the function argument first. In
other cases, you want to put the function argument second. One
common reason is readability. In particular, a multiline function
passed as an argument to another function is easiest to read when it
is the final argument to that function.

Higher-order functions and labels

One surprising gotcha with labeled arguments is that while order doesn't matter when
calling a function with labeled arguments, it does matter in a higher-order context,
e.g., when passing a function with labeled arguments to another
function. Here's an example:

Here, the definition of apply_to_tuple sets up the expectation that
its first argument is a function with two labeled arguments, first and second, listed in that order. We could have
defined apply_to_tuple differently
to change the order in which the labeled arguments were listed:

As a result, when passing labeled functions as arguments, you
need to take care to be consistent in your ordering of labeled
arguments.

Optional Arguments

An optional argument is like a labeled argument that the caller
can choose whether or not to provide. Optional arguments are passed in
using the same syntax as labeled arguments, and, like labeled arguments,
can be provided in any order.

Here's an example of a string concatenation function with an
optional separator. This function uses the ^ operator for pairwise string
concatenation:

Here, ? is used in the
definition of the function to mark sep as optional. And while the caller can pass
a value of type string for sep, internally to the function, sep is seen as a string option, with None appearing when sep is not provided by the caller.

The preceding example needed a bit of boilerplate to choose a
default separator when none was provided. This is a common enough
pattern that there's an explicit syntax for providing a default value,
which allows us to write concat more
concisely:

Optional arguments are very useful, but they're also easy to abuse. The key advantage of
optional arguments is that they let you write functions with multiple arguments that users
can ignore most of the time, only worrying about them when they specifically want to invoke
those options. They also allow you to extend an API with new functionality without changing
existing code.

The downside is that the caller may be unaware that there is a
choice to be made, and so may unknowingly (and wrongly) pick the default
behavior. Optional arguments really only make sense when the extra
concision of omitting the argument outweighs the corresponding loss of
explicitness.

This means that rarely used functions should not have optional arguments. A good rule of
thumb is to avoid optional arguments for functions internal to a module,
i.e., functions that are not included in the module's interface, or
mli file. We'll learn more about mlis in Chapter 4, Files, Modules, and Programs.

Explicit passing of an optional argument

Under the covers, a function with an optional argument receives
None when the caller doesn't
provide the argument, and Some when
it does. But the Some and None are normally not explicitly passed in
by the caller.

But sometimes, passing in Some or None explicitly is exactly what you want.
OCaml lets you do this by using ?
instead of ~ to mark the argument.
Thus, the following two lines are equivalent ways of specifying the
sep argument to
concat:

One use case for this is when you want to define a wrapper
function that mimics the optional arguments of the function it's
wrapping. For example, imagine we wanted to create a function called
uppercase_concat, which is the same
as concat except that it converts
the first string that it's passed to uppercase. We could write the
function as follows:

In the way we've written it, we've been forced to separately
make the decision as to what the default separator is. Thus, if we
later change concat's default
behavior, we'll need to remember to change uppercase_concat to match it.

Instead, we can have uppercase_concat simply pass through the
optional argument to concat using
the ? syntax:

Now, if someone calls uppercase_concat without an argument, an
explicit None will be passed to
concat, leaving concat to decide what the default behavior
should be.

Inference of labeled and optional arguments

One subtle aspect of labeled and optional arguments is how they
are inferred by the type system. Consider the following example for
computing numerical derivatives of a function of two real variables.
The function takes an argument delta, which determines the scale at which
to compute the derivative; values x
and y, which determine at which
point to compute the derivative; and the function f, whose derivative is being computed. The
function f itself takes two labeled
arguments, x and y. Note that you can use an apostrophe as
part of a variable name, so x' and
y' are just ordinary
variables:

In principle, it's not obvious how the order of the arguments to
f should be chosen. Since labeled
arguments can be passed in arbitrary order, it seems like it could as
well be y:float -> x:float ->
float as it is x:float ->
y:float -> float.

Even worse, it would be perfectly consistent for f to take an optional argument instead of a
labeled one, which could lead to this type signature for numeric_deriv:

Since there are multiple plausible types to choose from, OCaml
needs some heuristic for choosing between them. The heuristic the
compiler uses is to prefer labels to options and to choose the order
of arguments that shows up in the source code.

Note that these heuristics might at different points in the
source suggest different types. Here's a version of numeric_deriv where different invocations of
f list the arguments in different
orders:

As suggested by the error message, we can get OCaml to accept
the fact that f is used with
different argument orders if we provide explicit type information.
Thus, the following code compiles without error, due to the type
annotation on f:

The rule is: an optional argument is erased as soon as the first
positional (i.e., neither labeled nor optional) argument defined
after the optional argument is passed in. That
explains the behavior of prepend_pound. But if we had instead defined
concat with the optional argument
in the second position:

However, if all arguments to a function are presented at once,
then erasure of optional arguments isn't applied until all of the
arguments are passed in. This preserves our ability to pass in
optional arguments anywhere on the argument list. Thus, we can
write:

As you can see, OCaml's support for labeled and optional
arguments is not without its complexities. But don't let these
complexities obscure the usefulness of these features. Labels and
optional arguments are very effective tools for making your APIs both
more convenient and safer, and it's worth the effort of learning how
to use them effectively.