Haskell's type system

“Haskell has a strong, static type system with
inference.” For a short sentence, this one gives us a
lot for us to pore over. Because Haskell is quite different
from mainstream programming languages in how it treats types,
let's not assume too much shared understanding as we talk about
types.

Every value and expression in Haskell has a
type. The type of a value indicates that
it shares certain properties with other values that have the
same type. (You'll see us refer to a value “having the
type X”, or “being of type
X”. The two phrases mean the same thing.)
All values that have the type Integer have the
ability to be added to other values of type
Integer, and so on.

Haskell has a “strong” type system, in which
every expression and value has exactly one type. Another aspect
of Haskell's view of strong typing is that it will not
automatically convert values from one type to another, a feature
present in some other languages.

Note

Conversations about type systems can lead to many a
misunderstanding among programmers. Some people say that C
has a strong type system; others claim that Python does.
Haskell programmers think strong typing is still something
else.

Since there's no universally agreed upon meaning for the
phrase “strong type”, it's best not to assume
that someone else shares the same notion of it as you do. (We
also suggest that putting forth one definition as better than
another doesn't often lead to enlightened discussion.)

Having a “static” type system means that the
compiler knows the type of every value and expression at compile
time, before any code is ever executed. A Haskell compiler or
interpreter will detect when we try to use types inconsistently
in our code, and reject the code with an error message.

Some common basic types

We've already encountered a few types in the section called “First steps with types”. Haskell has a number of
built-in types that we'll use over and over. Here are some of
the language's basic types, types that aren't composed of other
types

The Char type represents a character. The
values of Char are drawn from the Unicode
character set, which covers most of the world's written
languages.

The Bool type represents a value in Boolean
logic. The possible values of Bool are
True and False.

The Int type represents a signed,
fixed-width integer. The exact range of values represented
by Int depends on the system's longest
“native” integer: on a 32-bit machine, an
Int is usually 32 bits wide, while on a 64-bit
machine, it is usually 64 bits wide.

The Integer type represents a signed
integer of arbitrary size. Integers are not
used as often as Ints, because they're a lot
more expensive to work with.

The Double type is the type usually used to
represent floating point numbers. It is typically 64 bits
wide, and uses the machine's native floating point
representation. (A narrower type, Float, also
exists, but its use is discouraged; Haskell compiler writers
concentrate more on making Double
efficient.)

Useful composite data types: lists and tuples

The most common composite data types in Haskell are the list
and tuple.

We've already seen the list type mentioned in the section called “Text, strings and lists”, where we found that Haskell
represents a text string as a list of Char values,
and that the type “list of Char” is
written [Char].

More generally, we can write the type “list of
a” for any type a by enclosing a in square brackets, [a].
Lists are strongly typed: a list of one type has an identity of
its own, distinct from a list of another type. The type
[Int] is a list that can only contain values of
type Int, for example. We also call the list type
polymorphic, because it can contain any
type of value.

Lists are the “bread and butter” of Haskell
collections. Whereas in an imperative language, we might repeat
a task over many items by iterating through a loop, this is
something that we tend to do in Haskell by recursing over a
list. We'll be spending a lot more time discussing lists in
Chapter 4, Functional programming.

A tuple is a fixed-size collection of values, each of which
can be of any type. Unlike a list (the elements of which must
all have the same type), there's no need for the elements of a
tuple to have related types. We write a tuple by enclosing its
elements in parentheses and separating them with commas. We use
the same notation for writing its type.

We can construct a tuple with any number of elements
(although in practice, a tuple with more than a handful becomes
unwieldy). A tuple's type encodes the number and types of its
elements in its own type. This means that tuples containing
different numbers or types of elements have distinct types of
their own.

A two-tuple of an Int and a String
has a different type than a two-tuple of a Bool and
a Bool, for example, and a three-tuple has a
different type than a four-tuple.

Probably the most common use of tuples is to let us return
multiple values from a function. We can also use them in other
places where we want a fixed-size collection of values, for
which the kind of container we're using isn't of much
importance. An example of this might be to represent a row in
the result of a a database query.

Exercises

1.

What are the types of the following
expressions?

False

(["foo", "bar"], 'a')

[(True, []), (False, [['a']])]

Calling functions

Now that we've had our fill of data types for a while, let's
turn our attention to working with some of
the types we've seen. We've already seen how to perform
arithmetic in the section called “Simple arithmetic”, and it
looks quite as it does in other languages.

However, our discussion of lists and tuples mentioned how we
can construct them, but not how we do anything with them
afterwards. Haskell defines a large library of functions for
working with lists and (to a lesser extent) tuples, so let's
find out how to use a few of those functions.

To call a function in Haskell, we provide the name of the
function followed by its arguments, all separated by white space.
As an example, let's call the head
function, which returns the first element of a list.

ghci> head [1,2,3]
1
ghci> head [1,2,3]
1

Its counterpart, tail, returns all
but the head of a list.

ghci> tail [True, False]
[False]
ghci> tail [True, False]
[False]

A related pair of functions, take and
drop, take two arguments: given a number
and a list, they return either the first, or all but the first,
number of elements of the list. (Here, remember that a
String is a list of Char.)

If you're used to function call syntax in other languages,
this notation can take a little getting used to, but it's
undeniably simple and uniform. Here's what we mean by
uniform.

ghci> snd (1,2)
2
ghci> snd (1,2)
2

In some other languages, the call to
snd above might mean “call
snd with two arguments,
1 and 2”, but in
Haskell, it's a single-argument call, passing the tuple
(1,2).

By the way, snd has a companion
function, fst, which returns the first
element of a tuple.

ghci> fst ("you", True)
"you"
ghci> fst ("you", True)
"you"

Passing an expression to a function

Haskell evaluates an expression from left to right. If we
want to use the result of one expression as an argument to
another, we have to keep this in mind and use parentheses to
tell the parser what we really mean. Here's an
example.

ghci> head (drop 4 "azerty")
't'
ghci> head (drop 4 "azerty")
't'

We can read this as “pass the result of the
expression drop 4 "azerty" as the argument to
head”. If we were to leave out
the parentheses, Haskell would instead interpret the
expression as “pass drop as the
argument to head”. Compilation
would fail with a type error, as drop is
a function, not a list.

Understanding a function's type signature

A consequence of Haskell's strong typing is that
fst and snd only
accept two-tuples as arguments. Let's see what ghci tells us
about their types.

We can read the -> above as
“returns”. The entire signature thus tells us that
fst takes any two-tuple whose elements are
of types a and b (each of which can be of any type),
and returns a value with type a.
Sure enough, if we try calling fst with a
three-tuple, things don't go so well.

Notice that the type signatures above give us a strong hint
as to what these functions might actually
do. This is an incredibly valuable property
of types in a functional language. Since there aren't usually
any side effects for us to worry about, figuring what a function
does can often be a matter of reading its name and understanding
its type signature, with no “regular documentation”
required.

Just as we called the list type polymorphic because it can
contain values of any type, when a function has type variables
in its signature, indicating that some of its arguments can be
of any type, we call the function polymorphic, too.

So far, we haven't seen a signature for a function that
takes more than one argument. We've
already encountered a few such functions; let's look at
take.

It's pretty clear that there's something going on with an
Int and some lists, but why are there two
-> symbols in the signature? Haskell
parses this chain of arrows from right to left. If we introduce
parentheses, it makes it clearer how Haskell is interpreting
this type signature.

take :: Int -> ([a] -> [a])take :: Int -> ([a] -> [a])

From this, it looks like we ought to read the type signature
as a function that takes one argument, an Int, and
returns another function. That other function also takes one
argument, a list, and returns a list of the same type as its
argument.

Haskell source files, and writing simple functions

Now that we know how to call functions, it's time we turned
our attention to writing them. While we can write functions in
ghci, it's not a good environment for doing so, because it
limits any expression or definition to one line in length. So
instead, we'll finally break down and create a source
file.

Haskell source files are usually identified with a suffix of
.hs. Here's a simple function definition:
open up a file named add.hs, and add these
contents to it.

add a b = a + badd a b = a + b

On the left hand side of the = is the
name of the function, followed by the arguments to the function.
On the right hand side is the body of the function. With our
source file saved, we can load it into ghci, and use our new
add function straight away.

When we call add, the variables
a and b on the left hand
side of our definition are given (or “bound to”)
the values 1 and 2, then
the right hand side is evaluated, and the result
returned.

Haskell doesn't need a return statement;
the result of a function is the result of evaluating whatever
expression is in the function's body.

Just what is a variable, anyway?

If you're used to imperative programming languages, you're
likely to think of a variable as a way of identifying a
“box”. You can get the value that's currently in
a box, and you can change the value in the box using
assignment.

In Haskell, a variable is the name of a
value. This is critically different from
the name of a box: we're used to being able to change what's
inside a box from our dealings with other languages, but we
can't change the “twoness” of the value
2.

Creating a variable in Haskell just gives a convenient
name to a value; we can't change the value associated with a
variable. This bears a strong similarity to the way we think
about variables in mathematics.

Conditional evaluation

Like other languages, Haskell has an if
expression. Let's see it in action, then we'll explain what's
going on. As an example, we'll write our own version of the
standard take function. Before we begin,
let's probe a little bit of how take
behaves, so we can replicate its behaviour.

From the above, it seems that take
returns an empty list if the number to remove is greater than
the number of elements, and that it treats negative numbers as
zero. Here's a myTake function that has
the same behaviour, and uses Haskell's if
expression to decide what to do.

Now that we've seen myTake in action,
let's return to the source code and look at a few of the
novelties we've introduced.

First is the if keyword itself. It takes an
expression of type Bool. If that evaluates to
True, it evaluates the expression on the
then branch. Otherwise, it evaluates the
expression on the else branch.

The combination of if, then,
else and the expression after each keyword
combine to make up a single expression, as far as Haskell is
concerned. Whichever branch is evaluated is the result of the
if. Because an expression can only have one
type, the expressions in the then and
else branches must have the same type. If they
don't, an if expression won't typecheck.

Whereas it can make sense in an imperative language to
omit the else branch from an if,
this would be nonsensical in Haskell. An if
expression that was missing an else wouldn't have
a value if the condition after the if evaluated
to False, so it couldn't typecheck.

The second novelty is almost trivial: the
null function, which we use in the
Boolean portion of the if, indicates whether a
list is empty.

Third is that our function calls itself recursively. This
is an early example of how, in Haskell, we use recursion where
in an imperative language we'd probably use a loop.

Finally, our if expression spans several
lines. We line the then and else
branches up under the if for neatness, but this
is not mandatory. We could put all of them on a single line,
for example, but we'd end up with less readable code.

Reporting errors

Haskell provides a standard function, error ::
Char -> a, that we can call when something has
gone terribly wrong in our code. We give it a string
parameter, which is the error message to display. Its type
signature looks peculiar; how can it produce a value of any
type a given just a string? The
answer is that it doesn't, because error
is special.

It has that type so that we can call it anywhere and it
will always have the right type. However, instead of
returning a value like a normal function, it immediately
terminates evaluation, and prints the error message we give
it. Here's an example. The mySecond
function returns the second element of its input list, but
fails if its input list isn't long enough.

Notice that in the third case above, where we're trying to
use the result of the call to mySecond as
the argument to another function, evaluation still terminates
and drops us back to the ghci prompt. This is both the
strength and weakness calling error: it
doesn't let us distinguish between recoverable errors and
problems so severe that they really should terminate a
program. We'll see one way around this problem in
the section called “How to represent a complicated result”.

Exercises

1.

Haskell provides a standard function, last ::
[a] -> a, that returns the last element of a
list. Write a function lastButOne,
that returns the element before
the last.

2.

Load your lastButOne function
into ghci, and try it out on lists of different
lengths. What happens when you pass it a list that's
too short?

Defining a new data type

Although lists and tuples are useful, we'll still often want
to construct new data types of our own. We define a new data
type using the data keyword.

The MyType after the data keyword
is the name of our new type. (As we've already mentioned, a
type name must start with a capital letter.) The string
MyConstructor is the name of the
constructor we'll call to create a value of
this type. (As with a type name, a constructor name must start
with a capital letter.) Finally, the Int and
String are the components of
the type. A component serves the same purpose in Haskell as a
field in a structure or class would in another language.

Note

We'll explain the full meaning of deriving
(Show) later, in
the section called “Show”. For now,
it's enough to know that we need to tack this onto a type
declaration so that ghci will automatically know how to
print a value of this type.

We can create a new value of type MyType by
treating MyConstructor as a function, and
calling it with arguments of types Int and
String.

From Haskell's perspective, then, a constructor is just
another function, one that happens to return a value of the type
we want to construct.

Algebraic data types

The Bool type that we introduced earlier is the
simplest example of a sort of type called an algebraic
data type. An algebraic data type has a fixed set
of possible values, each of which is identified by a distinct
constructor.

In the case of Bool, the type has two
constructors, True and False. Each
constructor is separated by a | character,
which we can read as “or”. These are usually
referred to as alternatives or cases.

data Bool = False | Truedata Bool = False | True

Each constructor can take zero or more arguments; the
numbers and types of the arguments accepted by each constructor
are independent. For example, here's one way we might represent
versions of the Windows operating system, where old releases
were monolithic, and newer releases have “service pack
levels” denoting major updates after their initial
releases.

The alternatives that represent older releases don't need
arguments, but those for the newer releases need an
Int to represent the patch level.

Analogues to algebraic data types in other
languages

Algebraic data types provide a single structuring
mechanism in instances where other languages have several
different building blocks. Here are some analogues from C and
C++, which might make it clearer what we can do with algebraic
data types.

With just one constructor, an algebraic data type groups
related values into one, and gives that value an identity
distinct from other types. It corresponds to a
struct in C or C++, and its components to the
fields of a struct.

If an algebraic data type has multiple alternatives, we
can think of it as similar to a union in C or
C++. A big difference between the two is that a union doesn't
tell us which alternative is actually present; we have to
record which alternative we're using ourselves, usually in
another field of a struct. This means that unions can
sometimes be sources of bugs, where our notion of which
alternative we should be using is incorrect.

With an algebraic data type, Haskell stores the
constructor that we use in the value that we create, so we
don't need to manually sock it away somewhere else.

Algebraic data types also serve where we'd use an
enum in C or C++, to represent a range of
discrete symbolic values.

A few final notes

From reading the preceding sections, it should be clear
that all of the data types that we define
with the data keyword are algebraic data types.
Some may have just one alternative; others have several; but
they're all using the same organising machinery.

Another useful thing to know is that it's perfectly okay
for the name of an algebraic type to have the same name as one
of its constructors to be the same. It's always obvious from
context whether we're using a name to refer to a type or a
constructor, so this doesn't introduce any ambiguity.

Giving a type and its constructor the same name is in fact
something that we do frequently when the type has just one
constructor.

Here, the variable a is not a
regular variable; it's called a type
variable, because it indicates that our
Wrapper type takes another type as its parameter
(hence calling it a parameterised type). What this lets us do is
use Wrapper on values of any type.

Wrapper is a “generic” container
type (albeit a fairly useless one); we can construct a
Wrapper from a value of any type. It is also
strongly typed; the type of whatever it contains is encoded in
its own type.

To once again extend an analogy to more familiar languages,
this gives us a facility that bears some resemblance to
templates in C++, and to generics in Java. (In fact, Java's
generics facility was inspired by several aspects of Haskell's
type system.)

We can nest uses of parameterised types inside each other,
but when we do, we may need to use parentheses to tell the
Haskell compiler what we mean.

Back to writing functions: local variables

Let's take a break from writing about types for a few
moments. Within the body of a function, we can introduce new
local variables whenever we need them, using a let
expression. As an example, let's write a function that
calculates the real-valued roots of the quadratic equation
a * (x ** 2) + b * x + c == 0.

The keywords to look out for here are let,
which starts a block of variable declarations, and
in, which ends it. Each line introduces a new
variable. The name is on the left of the =,
and its value on the right. We can use these variables both
within our block of variable declarations and in the expression
that follows the in keyword.

There's no problem with a variable earlier in a
let block referring to a later one, or even with
them referring to each other. (In some functional languages,
this sort of flexible let is named
letrec.)

We can have multiple let blocks within an
expression. There's also another mechanism we can use to
introduce local variables, called a where block.
The definitions in a where block apply to the code
that precedes it. Let's illustrate what we
mean with another example.

Here, the roots function returns the
real roots when they're defined, and the complex roots
otherwise. (We left out the divide-by-zero case for simplicity.)
While a “where” clause initially looks very weird
to non-Haskell programmers, it's a great way to put the
“important” code early, followed by the auxiliary
definitions that support it. After a while, you'll find
yourself missing where clauses in languages that
lack them!

The main difference between let and
where is one of scope. The scope of a
let only extends to the expression after the
in keyword, while the variables introduced by a
where clause are visible upwards to the beginning
of the block that it “belongs” to. Also,
let is always paired with an expression, but
where is paired with a block of equations.

It's probably obvious from context above, but
(:+) is the constructor for a complex
number, taking the real part on the left and the imaginary
part on the right.

Also, Complex is parameterised over the type
of complex number it should represent. In practice, it only
makes much sense to use Complex Double, since
GHC implements Double more efficiently than
Float.

How to represent a complicated result

We sneaked two previously unseen standard types,
Maybe and Either, into our
root-finding examples. We use Maybe when it might
not make sense to return a normal result, for example because a
function's result is undefined for some inputs. We use
Just to say “we have a
result”, and the argument to Just
is that result. When we can't give a result, we use
Nothing, which takes no arguments.

Compared to error, which we saw in
the section called “Reporting errors”, Maybe has
the huge advantage that it's a normal value, which we return to
our caller to deal with. Calling error, by
contrast, is more akin to pulling on the program's ejector seat
handle as it disintegrates around us; something terrible has
happened, and we need to give up right now.

The Either a b type gives us even more
flexibility than Maybe, as it's got two type
parameters. We can wrap a value of any type a with Left, or a value of
an unrelated type b with
Right. Our roots function
uses this to return a Double when the real roots
are defined, and a Complex Double when they're
not.

The offside rule, and white space in a function
body

In our definition of realRoots, the
left margin of our text wandered around quite a bit. This was
not an accident: in Haskell, white space has meaning.

Haskell uses indentation as a cue to parse sections of code.
This use of layout to convey structure is sometimes called the
offside rule. At the top level, the first declaration or
definition can start in any column, and the Haskell compiler or
interpreter remembers that indentation level. Every subsequent
top-level declaration must have the same indentation.

Here's an illustration of the top-level indentation rule.
Our first file, GoodIndent.hs, is well
behaved.

-- This is the leftmost column.
-- Our first declaration is in column 4.
firstBadIndentation = 1
-- Our second is left of the first, which is illegal!
secondBadIndentation = 2-- This is the leftmost column.
-- Our first declaration is in column 4.
firstBadIndentation = 1
-- Our second is left of the first, which is illegal!
secondBadIndentation = 2

An empty line is treated as a continuation of the current
item, as is a line indented to the right of the current current
item.

The rules for let expressions and
where clauses are similar. After a
let or where keyword, the Haskell
compiler or interpreter remembers the indentation of the next
token it sees. If the next line is empty, or its indentation is
further to the right than the previous line, this counts as
continuing the previous line. On the other hand, if the
indentation is the same as the previous line, this is treated as
beginning a new item in the same block.

Here are nested uses of let and
where.

bar = let b = 2
in let a = b
in abar = let b = 2
in let a = b
in a

In the body of bar, the variable
a is only visible within the let
expression that defines it. It's not visible to the
let expression that defines b;
only the result of the inner
let expression is visible.

foo = a
where a = b
where b = 2foo = a
where a = b
where b = 2

Similarly, the scope of the first where clause
is the definition of foo, but the scope of
the second is just the first where clause.

The indentation we use for the let and
where clauses makes our intentions easy to figure
out.

A note about tabs versus spaces

If you are using a Haskell-aware text editor (e.g. Emacs),
it is probably already configured to use space characters for
all white space within a line. If your editor is
not Haskell-aware, you should configure
it to only use space characters.

The reason for this is simple portability. In an editor
that uses a fixed-width font, tab stops are by default placed
at different intervals on Unix-like systems (every eight
characters) than on Windows (every four characters). This
means that no matter what your personal beliefs are about
where tabs belong, you can't rely on someone else's editor
honouring your preferences. Any indentation that uses tabs is
going to look broken under someone's
configuration; this could lead to compilation problems, too.
Using space characters instead avoids this problem
entirely.

The offside rule is not mandatory

We can use explicit structuring instead of layout to
indicate what we mean. To do so, we start a block of
equations with an opening curly brace; separate each item with
a semicolon; and finish the block with a closing curly brace.
The following two uses of let have the same
meanings.

When we use explicit structuring, the normal layout rules
don't apply, which is why we can get away with unusual
indentation in the second let expression.

We can use explicit structuring anywhere that we'd
normally use layout. It's valid for where
clauses, and even top-level declarations. Just remember that
although the facility exists, explicit structuring is hardly
ever actually used in Haskell
programs.

Recursive types

We call this a recursive type because
Tree, the type we're defining, appears both on the
left hand side and the right hand side of the definition: we
define the type in terms of itself.

A little more about lists

Now that we're getting familiar with some of the jargon
around types, we can revisit lists. Haskell's list type is a
parameterised type, because we can make lists of any other type.
It is also an algebraic data type, with two constructors. One
is the empty list, written [] (sometimes pronounced
“nil”, which is borrowed from Lisp).

ghci> []
[]
ghci> []
[]

The other is the (:) operator, often
pronounced “cons” (this is short for
“construct”, and also borrowed from Lisp). The
(:) operator takes an element and a list,
and constructs a new list.

Unit, the zero-element tuple

This type is only really used with parameterised data types,
to indicate that one of the type parameters isn't being used.
Since it doesn't encode any information, it's a rough equivalent
to void in C.

Here's an example of () in use. We can
generalise our earlier Tree type a little, so that
internal nodes contain values of type a, while leaves contain values of type
b.

If we wanted to create a ComplexTree where we
wanted to store Ints on the leaves, but don't care
about the internal nodes. We would write its type as
ComplexTree () Int.

Pattern matching

Although we introduced a handful of functions earlier that
can operate on lists, we've yet to see how we might generally
get values out of a constructed algebraic data type. Haskell
has a simple pattern matching facility that we can use to this
end.

A pattern lets us peer inside a compound value and bind
variables to the values it contains. In fact, when we define a
function, the parameters to that function are really patterns
that bind our variables to an entire value.

Here's an example of pattern matching in action on a list;
we're going to add all elements of the list together.

See that (x:xs) on the left of the first line?
The : means “match the head of a
list”; that's the familiar list constructor,
(:), in action in a new way. The variables
x and xs are given the
values of (“bound to”) the head and tail of the
list, respectively. The whole pattern is wrapped in parentheses
so Haskell won't parse it as three separate arguments.

What effect does pattern matching have? Haskell will only
evaluate the right hand side of an equation if it can match all
of the patterns on the left hand side. In the definition of
sumList above, the right hand side of the
first equation won't be evaluated if the input list is empty.
Instead, Haskell will “fall through” to the
equation on the following line, which does
have a pattern for the empty list, and it will evaluate
that.

It might initially look like we have two functions named
sumList here, but Haskell lets us define a
function as a series of equations; so in fact these two clauses
are defining the behaviour of one function, for different
inputs. (By the way, there's a standard function,
sum, that does this for us.)

The syntax for pattern matching on a tuple is similar to the
syntax for constructing a tuple. Here's a function that returns
the third element from a three-tuple.

third (a, b, c) = cthird (a, b, c) = c

There's no limit on how “deep” within a value a
pattern can look. Here's a definition that looks both inside a
tuple and inside a list within that tuple.

Notice that Haskell infers the type of the
unwrap function based on the constructor
we're using in our pattern. If we're trying to match a value
whose constructor is Wrapper, then the type
of that parameter must be Wrapper a.

Note

Haskell tests patterns for matches in the order in which
we list them in our code. It goes from top to bottom and
stops at the first match; it does not
check every pattern and use the best match.

If you're familiar with pattern matching from a logic
programming language like Prolog, Haskell's facility is
simpler and less powerful. It doesn't provide backtracking or
unification.

The don't-care, or wild card, pattern

When we're writing a pattern, we can specify that we don't
care what value a particular value within a structure has,
without actually binding that value to a name. The notation
for this is _ (called a wild card or “don't
care”), and we use it as follows. This function
tells us whether the result of the roots
function we defined earlier is real-valued or not.

Here, we don't care about the value of the result, just
about which constructor was used to create it. If it was
Left, the result must be a complex
number, otherwise it must be real. We can use a wild card for
the entire second pattern; there's no need to see if the
constructor is Right, because it
must be; Either only has two
constructors.

The case expression

We're not limited to using patterns in function
definitions. The case expression lets us match
patterns at any time. Here's what it looks like.

The case keyword is followed by an arbitrary
expression; the result of this expression is what we're
pattern matching on. The of keyword signifies
the end of the expression and the beginning of the block of
patterns and expressions.

Each item in the block consists of a pattern, followed by
an arrow ->, followed by an expression to
evaluate if that pattern matches. The result of the
case expression is the result of the expression
associated with the first pattern to match, taken from top to
bottom.

To express “here's the expression to evaluate if
none of the other patterns match”, we would just use
the wild card pattern _ as the last in our list
of patterns.

A flying visit back to the where clause

Now that we've seen that we can define a function as a
series of equations, the usefulness of the where
clause should be a bit more clear. Variables that we define
inside a where clause are visible across all of
the equations that precede it in a single block.

Limits to pattern matching

Every variable that appears within a pattern is a new
local variable that will be given a value if the pattern
matches, not a reference to some variable on
“outside” the pattern. Matching a pattern only
lets us do exact comparisons against constructors and simple
values. We can't use a pattern to match something like a
function. To see what we mean, take a look at the following
function.

A naive glance suggests that it's trying to check the
value of f to see if it's actually the
standard function head, but such an
interpretation would be quite wrong. Here's what is
really doing.

Because the first pattern in the case
expression is just a variable, this pattern will
always match, no matter what the value of
f is, and the value of f
will be given to the local variable head
when the right hand side is evaluated.

Note

A pattern that consists only of a variable will always
match, because it's not being compared against anything that
might cause the match to fail. We refer to patterns that
always match as “irrefutable”.

Finally, because the first pattern always matches, GHC
will always evaluate its right hand side and use that as the
result of the case expression. The second
pattern (the one with the wild card) will never actually be
reached. Because the two patterns will match the same values,
they are said to overlap. If GHC ever
complains to you about overlapping patterns, it's telling you
that one of your patterns is the same as another, and so will
never actually be matched.

Another limitation of patterns is that a variable can only
appear once in a pattern. For example, you can't put a
variable in multiple places within a pattern to express the
notion “this value and that should be
identical”.

The way around these restrictions of Haskell's patterns is
to use patterns in combination with a language facility called
guards, which we'll talk about
next.

Conditional evaluation with guards

We can further extend our expressive arsenal using
guards. A guard is an expression of type
Bool; if it evaluates to True,
the equation that follows it is evaluated. Otherwise, the next
guard in the series is evaluated, and so on. Here's an example
of guards in action.

Each guard is introduced by a | symbol,
followed by the guard expression, then an = symbol
(or -> if within a case
expression), then the expression to evaluate if the guard
succeeds.

The otherwise used in the second guard
has an obvious meaning: it's the expression to evaluate if
previous gaurds all evaluate to False. It's
not a special piece of syntax, though; it's just a variable
whose value is True.

We can use guards anywhere that we can use patterns. The
combination of writing a function as a series of equations,
pattern matching, and guards lets us write code that's clear and
easy to understand.

Here's a reformulation of that function using patterns and
guards. Instead of reasoning about what an if
expression is doing and which branch will be evaluated, the code
uses a series of equations with simple patterns and guards. This
makes it easier to understand the behaviour of the function
under different circumstances.

Let's return to one of the limitations of patterns that we
mentioned in the previous section: the fact that we can't use a
pattern to check two variables within the pattern for equality.
We can express this quite easily using a guarded pattern.

Here, for good measure, we've illustrated guard syntax in a
case expression.

Infix functions

Usually, when we define or call a function in Haskell, we
write the name of the function, followed by its arguments; this
is called prefix notation, because the name of the function
comes before its arguments. For a function that takes two
arguments, we have the option of using it in
infix form, between its first and second
arguments. This allows us to write expressions using functions
as if they were infix operators.

The syntax for defining or calling a function in infix form
is to enclose the name of the function in backtick characters
(sometimes known as backquotes). Here's a simple infix
definition.

a `plus` b = a + ba `plus` b = a + b

Defining a function in infix form doesn't change anything
about the behaviour of the function. We can call the function
using infix or prefix notation, as we prefer.

Infix notation is useful for more than just our own
functions. For example, Haskell's standard
Data.List module defines a function,
isPrefixOf, that indicates whether all
elements of its first argument are equal to the first elements
of its second argument.

There's no hard-and-fast rule that dictates when you ought
to use infix versus prefix notation, although prefix notation is
far more common. It's best to choose whichever makes your code
more readable in a specific situation.

Note

The backtick notation is not a general mechanism: it's a
piece of special syntax that applies only to names. For
example, we can't put backticks around an expression that
returns a function, and then treat that as an infix
function.

Conclusion

In this chapter, we've had a whirlwind overview of Haskell's
type system and much of its syntax. We've read about basic
types, compound types, and how to write our own algebraic data
types. We've seen how to write functions, and how to declare
local variables within them. We've read about Haskell's offside
rule for laying out functions, and how we can avoid it if we
need to. We've seen conditional evaluation, pattern matching
and guards. We've discussed error handling.

This all amounts to a lot of information to absorb. In
Chapter 4, Functional programming, we'll build on this basic knowledge to
understand how we can write, and think about, code in
Haskell.