Resources

Type-directed Programming

One of the principle advantages of programming in a strongly typed
programming language like OCaml is that the types of function
arguments and results can help guide the way you construct
programs.

In this class, whenever you write a function in
OCaml, you should do so by following the type-directed programming
methodology. This programming methodology has the following steps:

Write down the name of the function and the types of its arguments and
results.

Write a comment that explains its purpose and any preconditions.

Write several examples of what your function does.

Write the body of the function. (This is the hard part!)

Turn your examples into tests.

The place where types really help is in the hard part: Write the body
of the function because function bodies involve two conceptual
activities:

Deconstruct (ie, tear apart or analyze) the input values.

Construct (ie, build) the output values.

Types help because each type comes with a
clearly-defined associated
set of values. When we know the values associated with a type,
we know what exactly what inputs we have to analyze -- there is little
chance we will miss a case, and, indeed, if you program using good style,
the OCaml type checker will often warn you if you do accidentally miss
a case. We also know the range of possibilities available when building
a function's outputs. In both situations, types help us search for
and identify complete solutions to the programming problem at hand.

For example, the type bool comes with exactly
two values, true and false. When writing a
function with an argument of type bool, one must consider
what to do when supplied with the input true and
one must consider what to do when supplied with the input
false -- there are never any other possibilities to
consider. Analogously, when writing a function
with a result type bool, there are only two possible results
you can construct -- the value true and the value
false.

In following, we consider a number of the built-in OCaml types.
For each type, we'll discuss the set of values for that type
as well as how to deconstruct those values (for inputs) and construct
those values (for outputs).
An ml file associated with this lecture
may be found here.

Booleans

You already know the set of values that make up the boolean type:
true and false -- and that's it.
Given an input of type bool, we may determine which of the
two values we have using a match statement.
In general, match statements have the following form.

The code above evaluates the expression and then
checks to see whether the resulting value matches one of the
patterns. The patterns are checked against the computed value in order
and the result associated with the first pattern that matches is
executed. The kinds of patterns available depend upon the type
of the expression. When the expression
has type bool, there are two patterns ...
true and false, because, of course,
those are the only two values that
can have type bool. Hence, a boolean pattern matching
expression looks like this:

match expression with
| true -> result1
| false -> result2

Example.
Let's define a function that converts a boolean in to an
integer. According to our methodology, we will first
define the function name and types, write a comment
to explain what it does and then write down some examples:

Finally, we will convert our examples into tests.
Below I've created a series of tests from our examples using assert
statements. An assertion is simply an
expression with boolean type wrapped in the keyword assert.
Assertions have the benefit that they may be turned off in production
code by using the compiler option -noassert. Hence, you can put them
in your code for testing purposes but suffer no performance penalty when
deploying your final product.
See the OCaml manual for more.
Here is our final code with our tests.

Notice that we deleted the portion of the comment with the examples.
The examples were a good intermediate step for our own thinking process,
but the final code is almost identical to the comment so the comment is
redundant. The code itself is simple and clear enough that
additional comments just get in the way. It is better style
to omit them and "let the code do the talking" in this case.
(As an aside, notice that any function application
"f arg"
has higher precedence than an operator such as "=" so you do not
need parens around bool_to_int true.)

If you compile your file (using ocamlbuild),
and an assertion fails, the Assert_failure exception is raised with the source
file name and the location of the boolean expression as arguments.

By the way, a synonym for a match statement on booleans like the one above
is an if-then-else statement.
Hence, a completely equivalent piece of code is as
follows. Notice, of course, that our tests do not change when we change
how we write our function. Creating durable tests for a function helps keep
code correct as it evolves.

Most programmers will write functions on a single boolean using an
if statement. We introduced the idea of analyzing a boolean value
using pattern matching because pattern matching is the general paradigm
that programmers use to deconstruct data.
An if statement is a special case that only really exists for
historical reasons and because programmers coming from other kinds of
languages feel comfortable with it.

Ok, that's booleans,
and, by the way, if you were thinking "oh my god, I can't believe he spent
so much time on such a simple function." Well, you are right. It was
pretty easy -- a proficient OCaml programmer can write that function
in 5 seconds. Onwards and upwards!

Tuples

The type t1 * t2 represents pairs of values where the first
component of the pair has type t1 and the second component
has type t2. The type t1 * t2 * t3 represents
triples of values where the first
component of the triple has type t1, the second component
has type t2 and the third component has type t3.
An n-tuple has type t1 * t2 * ... * tn and has n such
components.

We create a pair or triple or n-tuple values by writing down a series
of expressions separated by commas and enclosed by parentheses. For
example:

Example.
To analyze a pair or any other kind of tuple, we may again use pattern
matching. Let's write a function to extract the string components of
a triple like the one above and return a string. According to our
methodology, we write the name and types first along with a comment.
Then we add some examples.

Above, the pattern (first, last, _) matches any
triple; the variable first is bound to the first
value in the triple and the variable last is bound
to the second value in the triple. The "_" is a pattern that matches
any value. It informs the reader of the code that "I don't care about
this value." In this function, we don't use the contents of
the age component, so the underscore pattern is appropriate.

Whenever a match statement contains just one pattern, a programmer
may replace the match statement with a let statement. For instance,
the following is a bit more compact.

While these latter two examples are more compact, it is important to
understand that pair patterns are just like boolean patterns or integer
patterns or any other kind of pattern in that they may be used within
a match expression. In more complicated examples, we may use several
different patterns in conjunction to analyze and extract information
from an input.

Example.
Define a function that computes the disjunction of a pair of booleans.

Now, since the input contains a pair of booleans and each boolean contains
two different values, true and false, it is natural that we would start
out writing our function using 4 cases (2*2 = 4). However, we might
now observe that these 4 cases may be written as two using a wildcard
pattern:

... but that is clearly going to get way out of hand and so many
cases are going to
be hard to read and verify. How can we reduce the number of cases?
Well, "counting" a single boolean is easy -- it involves just two
cases (as all functions on single booleans do!) -- if the boolean is
true, it returns 1 and if it is false, it returns 0. We'll just
use that function 5 times and sum the results. Moreover,
we've already written a function to "count" a single boolean --
it is called bool_to_int!
How lucky is that? (Even if we hadn't written it already, writing
it now would take 5 seconds and be far easier and clearer than
writing the 2^5 patterns we would have had to write if we followed
the naive approach.) Here is the code.

Unit Type

We have talked about pairs, which are tuples with 2 fields. We have
talked about triples, which are tuples with 3 fields. We have talked
about quintuples, which are tuples with 5 fields. Ever consider what
a tuple with 0 fields looks like? It looks like this: (). In
OCaml, this value is referred to colloquially as "unit" and it's type is also
called unit.

Surprisingly, even though the unit value has no information content,
it is quite heavily used! Whenever an expression has an effect
on the outside word, but returns no interesting data, unit is its type.
For instance, expressions that do nothing but print data to stdout will
typically have type unit. The following is an example of an expression
with type unit:

print_string "hello world\n\n"

Assertions are also expressions with type unit. Why? Because
when an assertion succeeds, it does nothing, returning the unit value.

It is also possible for a function to have no interesting input -- such
functions already contain all the data they need to execute. In such
cases, unit is a reasonable argument type. Like other types, one can
pattern match on expressions with unit type -- the pattern is ().

However, as with other kinds of pairs, since there is only one branch of
the match expression, we can (and should) shift the pattern in to the argument
position as follows (note that we omit the type of the argument in
this revision since the argument type unit is fully determined by the
pattern ()).

Example (Better Style).

let hello_world () : unit =
print_string "hello world\n"

Example.
Sometimes, we need to execute several unit-valued expressions in a row.
we could use successive pattern matching, but that is overly verbose.
Instead of successive pattern matching,
use a semi-colon to separate one unit-valued expression from the next.

Example.
Unit-valued functions
We can also use assertions within functions as part of our
testing apparatus. An effective way to test your functions
is often to compute the same answer in two different ways.
For instance, we know that disjunction should be symmetric.
If we find it isn't, we must have an error. Here is some
simple code to test for symmetry.

Option Types

An option type, written t option, contains two sorts of values,
the value None and the value Some v where v is a
value with type t.

Example. A point is a pair of integer coordinates.
Write a function that finds the slope of a line between two points.
Return None if the line is vertical and the slope is
undefined. Return Some slope if the slope is non-negative.

To start out, it is useful to define a type abbreviation for points:

type point = int * int

When defining a type abbreviation, the new type name (ie: point) is
in every way identical to its definition (ie: int * int).
Hence we may now use point and int * int
interchangeably in our code.
However, using point (where the data in question does in
fact represent a point) makes the code easier to read. It
is good documentation and good style. Now, on to computing the slope
of a line between two points.

Notice that we used pattern matching on points to extract their components.
This is perfectly legal since a point is a pair
and pattern matching on pairs is legal.
Before dividing yd by xd, we tested xd for zero. If it is not zero,
we divide and return a Some. If it is zero, we return None.

When testing floating point results, we would like to test that the
results are within an acceptable range as opposed to being exactly equal
to some constant we write down in our file
because of the imprecision of floating point arithmetic. Hence, to
facilitate testing, we will write another function, inrange, to help
us. Of course, whenever we write code to help us test our program
functionality, it is possible the testing code is incorrect. Nevertheless,
it usually helps us detect errors in our work because writing a computation
two different ways typically helps weed out errors. (Sometimes we'll find
an error in our test when the function being tested is correct. That's
ok, we can quickly fix the test.) Here's a testing function:

Summary

Strong, precise type systems help
guide the construction of functions.
Typically, we analyze the inputs to our functions
according to their type and build ouputs for our function,
again, according to their type. The following table
summarizes the types we have looked, the shape of the patterns
for analyzing values of those types and the common
deconstruction patterns for that type.