Chapter 5. Records

One of OCaml's best features is its concise and expressive system for
declaring new data types, and records are a key element of that system. We
discussed records briefly in Chapter 1, A Guided Tour, but this
chapter will go into more depth, covering the details of how records work,
as well as advice on how to use them effectively in your software
designs.

A record represents a collection of values stored together as one,
where each component is identified by a different field name. The basic
syntax for a record type declaration is as follows:

We can construct a host_info just
as easily. The following code uses the Shell module from Core_extended to dispatch commands to the shell to
extract the information we need about the computer we're running on. It also
uses the Time.now call from Core's
Time module:

You might wonder how the compiler inferred that my_host is of type host_info. The hook that the compiler uses in this
case to figure out the type is the record field name. Later in the chapter,
we'll talk about what happens when there is more than one record type in
scope with the same field name.

Once we have a record value in hand, we can extract elements from the
record field using dot notation:

When declaring an OCaml type, you always have the option of
parameterizing it by a polymorphic type. Records are no different in this
regard. So, for example, here's a type one might use to timestamp arbitrary
items:

Note that the pattern we used had only a single case, rather than
using several cases separated by |'s.
We needed only one pattern because record patterns are
irrefutable, meaning that a record pattern match will
never fail at runtime. This makes sense, because the set of fields
available in a record is always the same. In general, patterns for types
with a fixed structure, like records and tuples, are irrefutable, unlike
types with variable structures like lists and variants.

Another important characteristic of record patterns is that they
don't need to be complete; a pattern can mention only a subset of the
fields in the record. This can be convenient, but it can also be error
prone. In particular, this means that when new fields are added to the
record, code that should be updated to react to the presence of those new
fields will not be flagged by the compiler.

As an example, imagine that we wanted to add a new field to our
host_info record called os_release:

The code for host_info_to_string
would continue to compile without change. In this particular case, it's
pretty clear that you might want to update host_info_to_string in order to include os_release, and it would be nice if the type
system would give you a warning about the change.

Happily, OCaml does offer an optional warning for missing fields in
a record pattern. With that warning turned on (which you can do in the
toplevel by typing #warnings "+9"), the
compiler will warn about the missing field:

Characters 24-139:
Warning 9: the following labels are not bound in this record pattern:
os_release
Either bind these labels explicitly or add '; _' to the pattern.val host_info_to_string : host_info -> string = <fun>

You should think of OCaml's warnings as a powerful set of optional
static analysis tools, and you should eagerly enable them in your build
environment. You don't typically enable all warnings, but the defaults
that ship with the compiler are pretty good.

The warnings used for building the examples in this book are
specified with the following flag: -w
@A-4-33-41-42-43-34-44.

The syntax of this can be found by running ocaml -help, but this particular invocation
turns on all warnings as errors, disabling only the numbers listed
explicitly after the A.

Treating warnings as errors (i.e., making OCaml fail to compile
any code that triggers a warning) is good practice, since without it,
warnings are too often ignored during development. When preparing a
package for distribution, however, this is a bad idea, since the list of
warnings may grow from one release of the compiler to another, and so
this may lead your package to fail to compile on newer compiler
releases.

Field Punning

When the name of a variable coincides with the name of a record
field, OCaml provides some handy syntactic shortcuts. For example, the
pattern in the following function binds all of the fields in question to
variables of the same name. This is called field
punning:

Together, labeled arguments, field names, and field and label
punning encourage a style where you propagate the same names throughout
your codebase. This is generally good practice, since it encourages
consistent naming, which makes it easier to navigate the source.

Reusing Field Names

Defining records with the same field names can be problematic. Let's
consider a simple example: building types to represent the protocol used
for a logging server.

We'll describe three message types: log_entry, heartbeat, and logon. The log_entry message is used to deliver a log entry
to the server; the logon message is
sent to initiate a connection and includes the identity of the user
connecting and credentials used for authentication; and the heartbeat message is periodically sent by the
client to demonstrate to the server that the client is alive and
connected. All of these messages include a session ID and the time the
message was generated:

While it's possible to resolve ambiguous field names using type
annotations, the ambiguity can be a bit confusing. Consider the following
functions for grabbing the session ID and status from a heartbeat:

# let status_and_session t =(t.status_message, t.session_id);;

val status_and_session : heartbeat -> string * string = <fun>

# let session_and_status t =(t.session_id, t.status_message);;

Characters 44-58:
Error: The record type logon has no field status_message

Why did the first definition succeed without a type annotation and
the second one fail? The difference is that in the first case, the
type-checker considered the status_message field first and thus concluded
that the record was a heartbeat. When
the order was switched, the session_id
field was considered first, and so that drove the type to be considered to
be a logon, at which point t.status_message no longer made sense.

We can avoid this ambiguity altogether, either by using
nonoverlapping field names or, more generally, by minting a module for
each type. Packing types into modules is a broadly useful idiom (and one
used quite extensively by Core), providing for each type a namespace
within which to put related values. When using this style, it is standard
practice to name the type associated with the module t. Using this style we would write:

The module name Log_entry is
required to qualify the fields, because this function is outside of the
Log_entry module where the record was
defined. OCaml only requires the module qualification for one record
field, however, so we can write this more concisely. Note that we are
allowed to insert whitespace between the module path and the field
name:

The syntax here is a little surprising when you first encounter it.
The thing to keep in mind is that the dot is being used in two ways: the
first dot is a record field access, with everything to the right of the
dot being interpreted as a field name; the second dot is accessing the
contents of a module, referring to the record field important from within the module Log_entry. The fact that Log_entry is capitalized and so can't be a field
name is what disambiguates the two uses.

For functions defined within the module where a given record is
defined, the module qualification goes away entirely.

Functional Updates

Fairly often, you will find yourself wanting to create a new record
that differs from an existing record in only a subset of the fields. For
example, imagine our logging server had a record type for representing the
state of a given client, including when the last heartbeat was received
from that client. The following defines a type for representing this
information, as well as a function for updating the client information
when a new heartbeat arrives:

This is fairly verbose, given that there's only one field that we
actually want to change, and all the others are just being copied over
from t. We can use OCaml's
functional update syntax to do this more tersely. The
syntax of a functional update is as follows:

Functional updates make your code independent of the identity of the
fields in the record that are not changing. This is often what you want,
but it has downsides as well. In particular, if you change the definition
of your record to have more fields, the type system will not prompt you to
reconsider whether your code needs to change to accommodate the new
fields. Consider what happens if we decided to add a field for the status
message received on the last heartbeat:

The original implementation of register_heartbeat would now be invalid, and
thus the compiler would effectively warn us to think about how to handle
this new field. But the version using a functional update continues to
compile as is, even though it incorrectly ignores the new field. The
correct thing to do would be to update the code as follows:

Note that mutable assignment, and thus the <- operator, is not needed for initialization
because all fields of a record, including mutable ones, are specified when
the record is created.

OCaml's policy of immutable-by-default is a good one, but imperative
programming is an important part of programming in OCaml. We go into more
depth about how (and when) to use OCaml's imperative features in the section called “Imperative Programming”.

First-Class Fields

Consider the following function for extracting the usernames from a
list of Logon messages:

Here, we wrote a small function (fun x
-> x.Logon.user) to access the user field. This kind of accessor function is a
common enough pattern that it would be convenient to generate it
automatically. The fieldslib syntax
extension that ships with Core does just that.

The with fields annotation at the
end of the declaration of a record type will cause the extension to be
applied to a given type declaration. So, for example, we could have
defined Logon as follows:

Note that this will generate a lot of output
because fieldslib generates a large
collection of helper functions for working with record fields. We'll only
discuss a few of these; you can learn about the remainder from the
documentation that comes with fieldslib.

One of the functions we obtain is Logon.user, which we can use to extract the user
field from a logon message:

In addition to generating field accessor functions, fieldslib also creates a submodule called
Fields that contains a first-class
representative of each field, in the form of a value of type Field.t. The Field module provides the following
functions:

Field.name

Returns the name of a field

Field.get

Returns the content of a field

Field.fset

Does a functional update of a field

Field.setter

Returns None if the field
is not mutable or Some f if it
is, where f is a function for
mutating that field

A Field.t has two type
parameters: the first for the type of the record, and the second for the
type of the field in question. Thus, the type of Logon.Fields.session_id is (Logon.t, string) Field.t, whereas the type of
Logon.Fields.time is (Logon.t, Time.t) Field.t. Thus, if you call
Field.get on Logon.Fields.user, you'll get a function for
extracting the user field from a
Logon.t:

The type is Field.t_with_perm
rather than Field.t because fields have
a notion of access control that comes up in some special cases where we
expose the ability to read a field from a record, but not the ability to
create new records, and so we can't expose functional updates.

We can use first-class fields to do things like write a generic
function for displaying a record field:

As a side note, the preceding example is our first use of the
Fn module (short for "function"), which
provides a collection of useful primitives for dealing with functions.
Fn.id is the identity function.

fieldslib also provides
higher-level operators, like Fields.fold and Fields.iter, which let you walk over the fields
of a record. So, for example, in the case of Logon.t, the field iterator has the following
type:

This is a bit daunting to look at, largely because of the access
control markers, but the structure is actually pretty simple. Each labeled
argument is a function that takes a first-class field of the necessary
type as an argument. Note that iter
passes each of these callbacks the Field.t, not the contents of the specific record
field. The contents of the field, though, can be looked up using the
combination of the record and the Field.t.

Now, let's use Logon.Fields.iter
and show_field to print out all the
fields of a Logon record: