Chapter 3Expressions

In this chapter, we describe the syntax and informal semantics of Haskell expressions, including their translations
into the Haskell kernel, where appropriate. Except in the case of let expressions, these translations preserve both the
static and dynamic semantics. Free variables and constructors used in these translations always refer to entities
defined by the Prelude. For example, “concatMap” used in the translation of list comprehensions
(Section 3.11 ) means the concatMap defined by the Prelude, regardless of whether or not the identifier
“concatMap” is in scope where the list comprehension is used, and (if it is in scope) what it is bound
to.

exp

→

infixexp::[context=>] type

(expression type signature)

|

infixexp

infixexp

→

lexp qop infixexp

(infix operator application)

|

- infixexp

(prefix negation)

|

lexp

lexp

→

\ apat1… apatn-> exp

(lambda abstraction, n ≥ 1)

|

let declsin exp

(let expression)

|

if exp[;]then exp[;]else exp

(conditional)

|

case expof{ alts}

(case expression)

|

do{ stmts}

(do expression)

|

fexp

fexp

→

[fexp] aexp

(function application)

aexp

→

qvar

(variable)

|

gcon

(general constructor)

|

literal

|

( exp)

(parenthesized expression)

|

( exp1,…, expk)

(tuple, k ≥ 2)

|

[ exp1,…, expk]

(list, k ≥ 1)

|

[ exp1[, exp2]..[exp3]]

(arithmetic sequence)

|

[ exp| qual1,…, qualn]

(list comprehension, n ≥ 1)

|

(infixexp qop)

(left section)

|

(qop⟨-⟩ infixexp)

(right section)

|

qcon{ fbind1,…, fbindn}

(labeled construction, n ≥ 0)

|

aexp⟨qcon⟩{ fbind1,…, fbindn}

(labeled update, n≥1)

Expressions involving infix operators are disambiguated by the operator’s fixity (see Section 4.4.2 ). Consecutive
unparenthesized operators with the same precedence must both be either left or right associative to avoid a syntax
error. Given an unparenthesized expression “x qop(a,i) y qop(b,j) z” (where qop(a,i) means an operator with
associativity a and precedence i), parentheses must be added around either “xqop(a,i) y” or “y qop(b,j) z” when
i = j unless a = b =l or a = b = r.

An example algorithm for resolving expressions involving infix operators is given in Section 10.6 .

Negation is the only prefix operator in Haskell; it has the same precedence as the infix - operator defined in the
Prelude (see Section 4.4.2 , Figure 4.1 ).

The grammar is ambiguous regarding the extent of lambda abstractions, let expressions, and conditionals.
The ambiguity is resolved by the meta-rule that each of these constructs extends as far to the right as
possible.

Sample parses are shown below.

This

Parses as

f x + g y

(f x) + (g y)

- f x + y

(- (f x)) + y

let { ... } in x + y

let { ... } in (x + y)

z + let { ... } in x + y

z + (let { ... } in (x + y))

f x y :: Int

(f x y) :: Int

\ x -> a+b :: Int

\ x -> ((a+b) :: Int)

For the sake of clarity, the rest of this section will assume that expressions involving infix operators have been
resolved according to the fixities of the operators.

3.1 Errors

Errors during expression evaluation, denoted by ⊥ (“bottom”), are indistinguishable by a Haskell program from
non-termination. Since Haskell is a non-strict language, all Haskell types include ⊥. That is, a value of any type may
be bound to a computation that, when demanded, results in an error. When evaluated, errors cause immediate program
termination and cannot be caught by the user. The Prelude provides two functions to directly cause such errors:

error :: String -> a
undefined :: a

A call to error terminates execution of the program and returns an appropriate error indication to the operating
system. It should also display the string in some system-dependent manner. When undefined is used, the error
message is created by the compiler.

Translations of Haskell expressions use error and undefined to explicitly indicate where execution time errors
may occur. The actual program behavior when an error occurs is up to the implementation. The messages passed to
the error function in these translations are only suggestions; implementations may choose to display more or less
information when an error occurs.

3.2 Variables, Constructors, Operators, and Literals

aexp

→

qvar

(variable)

|

gcon

(general constructor)

|

literal

gcon

→

()

|

[]

|

(,{,})

|

qcon

var

→

varid|( varsym)

(variable)

qvar

→

qvarid|( qvarsym)

(qualified variable)

con

→

conid|( consym)

(constructor)

qcon

→

qconid|( gconsym)

(qualified constructor)

varop

→

varsym|` varid`

(variable operator)

qvarop

→

qvarsym|` qvarid`

(qualified variable operator)

conop

→

consym|` conid`

(constructor operator)

qconop

→

gconsym|` qconid`

(qualified constructor operator)

op

→

varop| conop

(operator)

qop

→

qvarop| qconop

(qualified operator)

gconsym

→

:| qconsym

Haskell provides special syntax to support infix notation. An operator is a function that can be applied using infix
syntax (Section 3.4 ), or partially applied using a section (Section 3.5 ).

An operator is either an operator symbol, such as + or $$, or is an ordinary identifier enclosed in grave accents
(backquotes), such as `op`. For example, instead of writing the prefix application op x y, one can write the infix
application x`op` y. If no fixity declaration is given for `op` then it defaults to highest precedence and left
associativity (see Section 4.4.2 ).

Dually, an operator symbol can be converted to an ordinary identifier by enclosing it in parentheses. For example,
(+) x y is equivalent to x + y, and foldr (⋆) 1 xs is equivalent to foldr (\x y -> x⋆y) 1 xs.

Special syntax is used to name some constructors for some of the built-in types, as found in the production for gcon
and literal. These are described in Section 6.1 .

An integer literal represents the application of the function fromInteger to the appropriate value of type
Integer. Similarly, a floating point literal stands for an application of fromRational to a value of type
Rational (that is, Ratio Integer).

Translation:
The integer literal i is equivalent to fromIntegeri, where fromInteger is a method in class Num (see
Section 6.4.1 ).

The floating point literal f is equivalent to fromRational (nRatio.%d), where fromRational is a
method in class Fractional and Ratio.% constructs a rational from two integers, as defined in the Ratio
library. The integers n and d are chosen so that n∕d= f.

3.3 Curried Applications and Lambda Abstractions

fexp

→

[fexp] aexp

(function application)

lexp

→

\ apat1… apatn-> exp

(lambda abstraction, n ≥ 1)

Function application is written e1 e2. Application associates to the left, so the parentheses may be
omitted in (f x) y. Because e1 could be a data constructor, partial applications of data constructors are
allowed.

Lambda abstractions are written \ p1… pn-> e, where the pi are patterns. An expression such as \x:xs->x
is syntactically incorrect; it may legally be written as \(x:xs)->x.

The set of patterns must be linear—no variable may appear more than once in the set.

Translation:
The following identity holds:

\ p1… pn-> e

=

\ x1… xn-> case (x1,…, xn) of (p1,…, pn) -> e

where the xi are new identifiers.

Given this translation combined with the semantics of case expressions and pattern matching described in
Section 3.17.3 , if the pattern fails to match, then the result is ⊥.

3.4 Operator Applications

infixexp

→

lexp qop infixexp

|

-infixexp

(prefix negation)

|

lexp

qop

→

qvarop| qconop

(qualified operator)

The form e1 qop e2 is the infix application of binary operatorqop to expressions e1 and e2.

The special form -e denotes prefix negation, the only prefix operator in Haskell, and is syntax for negate(e). The
binary - operator does not necessarily refer to the definition of - in the Prelude; it may be rebound by the module
system. However, unary - will always refer to the negate function defined in the Prelude. There is no link between
the local meaning of the - operator and unary negation.

Prefix negation has the same precedence as the infix operator - defined in the Prelude (see Table 4.1 ). Because
e1-e2 parses as an infix application of the binary operator -, one must write e1(-e2) for the alternative
parsing. Similarly, (-) is syntax for (\ x y -> x-y), as with any infix operator, and does not denote
(\ x -> -x)—one must use negate for that.

Translation:
The following identities hold:

e1 op e2

=

(op) e1 e2

-e

=

negate(e)

3.5 Sections

aexp

→

(infixexp qop)

(left section)

|

(qop⟨-⟩ infixexp)

(right section)

Sections are written as ( op e) or ( e op), where op is a binary operator and e is an expression. Sections are a
convenient syntax for partial application of binary operators.

Syntactic precedence rules apply to sections as follows. (op e) is legal if and only if (x op e) parses in the same
way as (x op(e)); and similarly for (e op). For example, (⋆a+b) is syntactically invalid, but
(+a⋆b) and (⋆(a+b)) are valid. Because (+) is left associative, (a+b+) is syntactically correct, but
(+a+b) is not; the latter may legally be written as (+(a+b)). As another example, the expression

(let n = 10 in n +)

is invalid because, by the let/lambda meta-rule (Section 3 ), the expression

(let n = 10 in n + x)

parses as

(let n = 10 in (n + x))

rather than

((let n = 10 in n) + x)

Because - is treated specially in the grammar, (- exp) is not a section, but an application of prefix negation, as
described in the preceding section. However, there is a subtract function defined in the Prelude such that
(subtract exp) is equivalent to the disallowed section. The expression (+ (- exp)) can serve the same
purpose.

Translation:
The following identities hold:

(op e)

=

\ x-> x op e

(e op)

=

\ x-> e op x

where op is a binary operator, e is an expression, and x is a variable that does not occur free in
e.

3.6 Conditionals

lexp

→

if exp[;]then exp[;]else exp

A conditional expressionhas the form if e1then e2else e3 and returns the value of e2 if the value of e1 is
True, e3 if e1 is False, and ⊥ otherwise.

Translation:
The following identity holds:

if e1then e2else e3

=

case e1of { True -> e2; False -> e3}

where True and False are the two nullary constructors from the type Bool, as defined in the Prelude. The
type of e1 must be Bool; e2 and e3 must have the same type, which is also the type of the entire conditional
expression.

3.7 Lists

infixexp

→

exp1 qop exp2

aexp

→

[ exp1,…, expk]

(k ≥ 1)

|

gcon

gcon

→

[]

|

qcon

qcon

→

( gconsym)

qop

→

qconop

qconop

→

gconsym

gconsym

→

:

Lists are written [e1,…, ek], where k ≥ 1. The list constructor is :, and the empty list is denoted
[]. Standard operations on lists are given in the Prelude (see Section 6.1.3 , and Chapter 9 notably
Section 9.1 ).

Translation:
The following identity holds:

[e1,…, ek]

=

e1: (e2: (…(ek: [])))

where : and [] are constructors for lists, as defined in the Prelude (see Section 6.1.3 ). The types of
e1 through ek must all be the same (call it t), and the type of the overall expression is [t] (see
Section 4.1.2 ).

The constructor “:” is reserved solely for list construction; like [], it is considered part of the language
syntax, and cannot be hidden or redefined. It is a right-associative operator, with precedence level 5
(Section 4.4.2 ).

3.8 Tuples

aexp

→

( exp1,…, expk)

(k ≥ 2)

|

qcon

qcon

→

(,{,})

Tuples are written (e1,…, ek), and may be of arbitrary length k ≥ 2. The constructor for an n-tuple is denoted by
(,…,), where there are n − 1 commas. Thus (a,b,c) and (,,) a b c denote the same value. Standard
operations on tuples are given in the Prelude (see Section 6.1.4 and Chapter 9 ).

Translation:(e1,…, ek) for k ≥ 2 is an instance of a k-tuple as defined in the Prelude, and requires no translation. If t1
through tk are the types of e1 through ek, respectively, then the type of the resulting tuple is (t1,…, tk) (see
Section 4.1.2 ).

3.9 Unit Expressions and Parenthesized Expressions

aexp

→

gcon

|

( exp)

gcon

→

()

The form (e) is simply a parenthesized expression, and is equivalent to e. The unit expression() has type () (see
Section 4.1.2 ). It is the only member of that type apart from ⊥, and can be thought of as the “nullary tuple” (see
Section 6.1.5 ).

Translation:(e) is equivalent to e.

3.10 Arithmetic Sequences

aexp

→

[ exp1[, exp2]..[exp3]]

The arithmetic sequence[e1, e2.. e3] denotes a list of values of type t, where each of the ei has type t, and t is
an instance of class Enum.

Translation:
Arithmetic sequences satisfy these identities:

[e1.. ]

=

enumFrome1

[e1,e2.. ]

=

enumFromThene1e2

[e1..e3 ]

=

enumFromToe1e3

[e1,e2..e3 ]

=

enumFromThenToe1e2e3

where enumFrom, enumFromThen, enumFromTo, and enumFromThenTo are class methods in the class
Enum as defined in the Prelude (see Figure 6.1 ).

The semantics of arithmetic sequences therefore depends entirely on the instance declaration for the type t. See
Section 6.3.4 for more details of which Prelude types are in Enum and their semantics.

3.11 List Comprehensions

aexp

→

[ exp| qual1,…, qualn]

(list comprehension, n ≥ 1)

qual

→

pat<- exp

(generator)

|

let decls

(local declaration)

|

exp

(boolean guard)

A list comprehension has the form [ e| q1,…, qn],n ≥ 1, where the qi qualifiers are either

generators of the form p<- e, where p is a pattern (see Section 3.17 ) of type t and e is an expression
of type [t]

local bindings that provide new definitions for use in the generated expression e or subsequent boolean
guards and generators

boolean guards, which are arbitrary expressions of type Bool.

Such a list comprehension returns the list of elements produced by evaluating e in the successive environments
created by the nested, depth-first evaluation of the generators in the qualifier list. Binding of variables occurs
according to the normal pattern matching rules (see Section 3.17 ), and if a match fails then that element of the list is
simply skipped over. Thus:

[ x | xs <- [ [(1,2),(3,4)], [(5,4),(3,2)] ],
(3,x) <- xs ]

yields the list [4,2]. If a qualifier is a boolean guard, it must evaluate to True for the previous pattern
match to succeed. As usual, bindings in list comprehensions can shadow those in outer scopes; for
example:

[ x | x <- x, x <- x ]

=

[ z | y <- x, z <- y]

Translation:
List comprehensions satisfy these identities, which may be used as a translation into the kernel:

[ e | True ]

=

[e]

[ e |q ]

=

[ e| q, True ]

[ e |b, Q]

=

if bthen[ e |Q ]else []

[ e |p<- l, Q ]

=

let ok p=[ e |Q ]

ok _ = []

in concatMap ok l

[ e | let decls, Q ]

=

let declsin[ e |Q ]

where e ranges over expressions, p over patterns, l over list-valued expressions, b over boolean expressions,
decls over declaration lists, q over qualifiers, and Q over sequences of qualifiers. ok is a fresh variable. The
function concatMap, and boolean value True, are defined in the Prelude.

As indicated by the translation of list comprehensions, variables bound by let have fully polymorphic types while
those defined by <- are lambda bound and are thus monomorphic (see Section 4.5.4 ).

3.12 Let Expressions

lexp

→

let declsin exp

Let expressions have the general form let { d1;…; dn} in e, and introduce a nested, lexically-scoped,
mutually-recursive list of declarations (let is often called letrec in other languages). The scope of the
declarations is the expression e and the right hand side of the declarations. Declarations are described
in Chapter 4 . Pattern bindings are matched lazily; an implicit ~ makes these patterns irrefutable. For
example,

let (x,y) = undefined ine

does not cause an execution-time error until x or y is evaluated.

Translation:
The dynamic semantics of the expression let { d1;…; dn} in e0 are captured by this
translation: After removing all type signatures, each declaration di is translated into an equation of
the form pi= ei, where pi and ei are patterns and expressions respectively, using the translation
in Section 4.4.3 . Once done, these identities hold, which may be used as a translation into the
kernel:

let {p1=e1; ... ;pn=en} ine0

=

let (~p1, ... ,~pn) = (e1, ... ,en) ine0

letp =e1 ine0

=

casee1 of ~p ->e0

where no variable in p appears free in e1

letp =e1 ine0

=

letp = fix ( \ ~p ->e1) ine0

where fix is the least fixpoint operator. Note the use of the irrefutable patterns ~p. This translation does
not preserve the static semantics because the use of case precludes a fully polymorphic typing
of the bound variables. The static semantics of the bindings in a let expression are described in
Section 4.4.3 .

3.13 Case Expressions

lexp

→

case expof{ alts}

alts

→

alt1;…; altn

(n ≥ 1)

alt

→

pat-> exp[where decls]

|

pat gdpat[where decls]

|

(empty alternative)

gdpat

→

guards-> exp[ gdpat]

guards

→

| guard1,…, guardn

(n ≥ 1)

guard

→

pat<- infixexp

(pattern guard)

|

let decls

(local declaration)

|

infixexp

(boolean guard)

A case expression has the general form
case eof {p1 match1;…; pn matchn}
where each matchi is of the general form

|gsi1

-> ei1

…

|gsimi

-> eimi

where declsi

(Notice that in the syntax rule for guards, the “|” is a terminal symbol, not the syntactic metasymbol for
alternation.) Each alternative pi matchi consists of a patternpi and its matches, matchi. Each match in turn consists
of a sequence of pairs of guardsgsij and bodies eij (expressions), followed by optional bindings (declsi) that scope
over all of the guards and expressions of the alternative.

A guard has one of the following forms:

pattern guards are of the form p<- e, where p is a pattern
(see Section 3.17 ) of type t and e is an expression type
t1.
They succeed if the expression e matches the pattern p, and introduce the bindings of the pattern to the
environment.

local bindings are of the form let decls. They always succeed, and they introduce the names defined
in decls to the environment.

boolean guards are arbitrary expressions of type Bool. They succeed if the expression evaluates to
True, and they do not introduce new names to the environment. A boolean guard, g, is semantically
equivalent to the pattern guard True <-g.

An alternative of the form
pat-> expwhere decls
is treated as shorthand for:

pat| True

-> exp

where decls

A case expression must have at least one alternative and each alternative must have at least one body. Each body
must have the same type, and the type of the whole expression is that type.

A case expression is evaluated by pattern matching the expression e against the individual alternatives. The
alternatives are tried sequentially, from top to bottom. If e matches the pattern of an alternative, then the guarded
expressions for that alternative are tried sequentially from top to bottom in the environment of the case expression
extended first by the bindings created during the matching of the pattern, and then by the declsi in the where clause
associated with that alternative.

For each guarded expression, the comma-separated guards are tried sequentially from left to right. If all of
them succeed, then the corresponding expression is evaluated in the environment extended with the
bindings introduced by the guards. That is, the bindings that are introduced by a guard (either by using a
let clause or a pattern guard) are in scope in the following guards and the corresponding expression.
If any of the guards fail, then this guarded expression fails and the next guarded expression is tried.

If none of the guarded expressions for a given alternative succeed, then matching continues with the next alternative.
If no alternative succeeds, then the result is ⊥. Pattern matching is described in Section 3.17 , with the formal
semantics of case expressions in Section 3.17.3 .

A note about parsing. The expression

case x of { (a,_) | let b = not a in b :: Bool -> a }

is tricky to parse correctly. It has a single unambiguous parse, namely

case x of { (a,_) | (let b = not a in b :: Bool) -> a }

However, the phrase Bool -> a is syntactically valid as a type, and parsers with limited lookahead may
incorrectly commit to this choice, and hence reject the program. Programmers are advised, therefore, to
avoid guards that end with a type signature — indeed that is why a guard contains an infixexp not an
exp.

3.14 Do Expressions

lexp

→

do{ stmts}

(do expression)

stmts

→

stmt1… stmtn exp[;]

(n ≥ 0)

stmt

→

exp;

|

pat<- exp;

|

let decls;

|

;

(empty statement)

A do expression provides a more conventional syntax for monadic programming. It allows an expression such as

putStr "x: " >>
getLine >>= \l ->
return (words l)

to be written in a more traditional way as:

do putStr "x: "
l <- getLine
return (words l)

Translation:
Do expressions satisfy these identities, which may be used as a translation into the kernel, after eliminating
empty stmts:

do {e}

=

e

do {e;stmts}

=

e>> do {stmts}

do {p <-e;stmts}

=

let okp = do {stmts}

ok _ = fail "..."

ine >>= ok

do {letdecls;stmts}

=

letdeclsin do {stmts}

The ellipsis "..." stands for a compiler-generated error message, passed to fail, preferably giving some
indication of the location of the pattern-match failure; the functions >>, >>=, and fail are operations in the
class Monad, as defined in the Prelude; and ok is a fresh identifier.

As indicated by the translation of do, variables bound by let have fully polymorphic types while those defined by
<- are lambda bound and are thus monomorphic.

3.15 Datatypes with Field Labels

A datatype declaration may optionally define field labels (see Section 4.2.1 ). These field labels can be used to
construct, select from, and update fields in a manner that is independent of the overall structure of the
datatype.

Different datatypes cannot share common field labels in the same scope. A field label can be used
at most once in a constructor. Within a datatype, however, a field label can be used in more than one
constructor provided the field has the same typing in all constructors. To illustrate the last point, consider:

Here S is legal but T is not, because y is given inconsistent typings in the latter.

3.15.1 Field Selection

aexp

→

qvar

Field labels are used as selector functions. When used as a variable, a field label serves as a function that extracts the
field from an object. Selectors are top level bindings and so they may be shadowed by local variables but cannot
conflict with other top level bindings of the same name. This shadowing only affects selector functions; in record
construction (Section 3.15.2 ) and update (Section 3.15.3 ), field labels cannot be confused with ordinary
variables.

where C1… Cn are all the constructors of the datatype containing a field labeled with f, pij is y when f labels
the jth component of Ci or _ otherwise, and ei is y when some field in Ci has a label of f or undefined
otherwise.

3.15.2 Construction Using Field Labels

aexp

→

qcon{ fbind1,…, fbindn}

(labeled construction, n ≥ 0)

fbind

→

qvar= exp

A constructor with labeled fields may be used to construct a value in which the components are specified by name
rather than by position. Unlike the braces used in declaration lists, these are not subject to layout; the { and }
characters must be explicit. (This is also true of field updates and field patterns.) Construction using field labels is
subject to the following constraints:

Only field labels declared with the specified constructor may be mentioned.

A field label may not be mentioned more than once.

Fields not mentioned are initialized to ⊥.

A compile-time error occurs when any strict fields (fields whose declared types are prefixed by !) are
omitted during construction. Strict fields are discussed in Section 4.2.1 .

The expression F {}, where F is a data constructor, is legal whether or notFwas declared with record syntax
(provided F has no strict fields — see the fourth bullet above); it denotes F⊥1…⊥n, where n is the arity of
F.

Translation:
In the binding f=v, the field f labels v.

C{bs}

=

C(pick1C bsundefined)…(pickkC bsundefined)

where k is the arity of C.

The auxiliary function pickiC bs d is defined as follows:

If the ith component of a constructor C has the field label f, and if f = v appears in the
binding list bs, then pickiC bs d is v. Otherwise, pickiC bs d is the default value d.

3.15.3 Updates Using Field Labels

aexp

→

aexp⟨qcon⟩{ fbind1,…, fbindn}

(labeled update, n ≥ 1)

Values belonging to a datatype with field labels may be non-destructively updated. This creates a new value in which
the specified field values replace those in the existing value. Updates are restricted in the following
ways:

All labels must be taken from the same datatype.

At least one constructor must define all of the labels mentioned in the update.

No label may be mentioned more than once.

An execution error occurs when the value being updated does not contain all of the specified labels.

Translation:
Using the prior definition of pick,

e{bs}

=

caseeof

C1 v1… vk1->C1(pick1C1 bs v1)…(pickk1C1 bs vk1)

...

Cj v1… vkj->Cj(pick1Cj bs v1)…(pickkjCj bs vkj)

_ -> error "Update error"

where {C1,…,Cj} is the set of constructors containing all labels in bs, and ki is the arity of Ci.

Here are some examples using labeled fields:

data T = C1 {f1,f2 :: Int}
| C2 {f1 :: Int,
f3,f4 :: Char}

Expression

Translation

C1 {f1 = 3}

C1 3 undefined

C2 {f1 = 1, f4 = 'A', f3 = 'B'}

C2 1 'B' 'A'

x {f1 = 1}

case x of C1 _ f2 -> C1 1 f2

C2 _ f3 f4 -> C2 1 f3 f4

The field f1 is common to both constructors in T. This example translates expressions using constructors
in field-label notation into equivalent expressions using the same constructors without field labels. A
compile-time error will result if no single constructor defines the set of field labels used in an update, such as
x {f2 = 1, f3 = 'x'}.

3.16 Expression Type-Signatures

exp

→

exp::[context=>] type

Expression type-signatures have the form e:: t, where e is an expression and t is a type (Section 4.1.2 );
they are used to type an expression explicitly and may be used to resolve ambiguous typings due to
overloading (see Section 4.3.4 ). The value of the expression is just that of exp. As with normal type
signatures (see Section 4.4.1 ), the declared type may be more specific than the principal type derivable from
exp, but it is an error to give a type that is more general than, or not comparable to, the principal type.

Translation:

e:: t

=

let { v:: t; v= e} inv

3.17 Pattern Matching

Patterns appear in lambda abstractions, function definitions, pattern bindings, list comprehensions, do expressions,
and case expressions. However, the first five of these ultimately translate into case expressions, so defining the
semantics of pattern matching for case expressions is sufficient.

3.17.1 Patterns

Patterns have this syntax:

pat

→

lpat qconop pat

(infix constructor)

|

lpat

lpat

→

apat

|

-(integer| float)

(negative literal)

|

gcon apat1… apatk

(arity gcon= k, k ≥ 1)

apat

→

var[@ apat]

(as pattern)

|

gcon

(arity gcon=0)

|

qcon{ fpat1,…, fpatk}

(labeled pattern, k ≥ 0)

|

literal

|

_

(wildcard)

|

( pat)

(parenthesized pattern)

|

( pat1,…, patk)

(tuple pattern, k ≥ 2)

|

[ pat1,…, patk]

(list pattern, k ≥ 1)

|

~ apat

(irrefutable pattern)

fpat

→

qvar= pat

The arity of a constructor must match the number of sub-patterns associated with it; one cannot match against a
partially-applied constructor.

All patterns must be linear—no variable may appear more than once. For example, this definition is illegal:

f (x,x) = x -- ILLEGAL; x used twice in pattern

Patterns of the form var@pat are called as-patterns, and allow one to use var as a name for the value being matched
by pat. For example,

Patterns of the form _ are wildcards and are useful when some part of a pattern is not referenced on the
right-hand-side. It is as if an identifier not used elsewhere were put in its place. For example,

case e of { [x,_,_] -> if x==0 then True else False }

is equivalent to:

case e of { [x,y,z] -> if x==0 then True else False }

3.17.2 Informal Semantics of Pattern Matching

Patterns are matched against values. Attempting to match a pattern can have one of three results: it may fail; it may
succeed, returning a binding for each variable in the pattern; or it may diverge (i.e. return ⊥). Pattern matching
proceeds from left to right, and outside to inside, according to the following rules:

Matching the pattern var against a value v always succeeds and binds var to v.

Matching the pattern ~apat against a value v always succeeds. The free variables in apat are bound to
the appropriate values if matching apat against v would otherwise succeed, and to ⊥ if matching apat
against v fails or diverges. (Binding does not imply evaluation.)

Operationally, this means that no matching is done on a ~apat pattern until one of the variables in
apat is used. At that point the entire pattern is matched against the value, and if the match fails or
diverges, so does the overall computation.

Matching the wildcard pattern _ against any value always succeeds, and no binding is done.

Matching the pattern con pat against a value, where con is a constructor defined by newtype, depends on the
value:

If the value is of the form con v, then pat is matched against v.

If the value is ⊥, then pat is matched against ⊥.

That is, constructors associated with newtype serve only to change the type of a value.

Matching the pattern con pat1… patn against a value, where con is a constructor defined by data, depends
on the value:

If the value is of the form con v1… vn, sub-patterns are matched left-to-right against the
components of the data value; if all matches succeed, the overall match succeeds; the first to fail
or diverge causes the overall match to fail or diverge, respectively.

If the value is of the form con′ v1… vm, where con is a different constructor to con′, the match
fails.

If the value is ⊥, the match diverges.

Matching against a constructor using labeled fields is the same as matching ordinary constructor patterns except
that the fields are matched in the order they are named in the field list. All fields listed must be declared by the
constructor; fields may not be named more than once. Fields not named by the pattern are ignored (matched
against _).

Matching a numeric, character, or string literal pattern k against a value vsucceeds if v== k, where == is
overloaded based on the type of the pattern. The match diverges if this test diverges.

The interpretation of numeric literals is exactly as described in Section 3.2 ; that is, the overloaded function
fromInteger or fromRational is applied to an Integer or Rational literal (resp) to convert it to the
appropriate type.

Matching an as-pattern var@apat against a value v is the result of matching apat against v, augmented with
the binding of var to v. If the match of apat against v fails or diverges, then so does the overall
match.

Aside from the obvious static type constraints (for example, it is a static error to match a character against a
boolean), the following static class constraints hold:

An integer literal pattern can only be matched against a value in the class Num.

A floating literal pattern can only be matched against a value in the class Fractional.

It is sometimes helpful to distinguish two kinds of patterns. Matching an irrefutable patternis non-strict: the pattern
matches even if the value to be matched is ⊥. Matching a refutable pattern is strict: if the value to be matched is ⊥
the match diverges. The irrefutable patterns are as follows: a variable, a wildcard, N apat where N
is a constructor defined by newtype and apat is irrefutable (see Section 4.2.3 ), var@apat where
apat is irrefutable, or of the form ~apat (whether or not apat is irrefutable). All other patterns are
refutable.

Here are some examples:

If the pattern ['a','b'] is matched against ['x',⊥], then 'a'fails to match against 'x', and
the result is a failed match. But if ['a','b'] is matched against [⊥,'x'], then attempting to
match 'a' against ⊥ causes the match to diverge.

These examples demonstrate refutable vs. irrefutable matching:

(\ ~(x,y) -> 0)⊥⇒ 0

(\ (x,y) -> 0)⊥⇒⊥

(\ ~[x] -> 0) []⇒ 0

(\ ~[x] -> x) []⇒⊥

(\ ~[x,~(a,b)] -> x) [(0,1),⊥]⇒ (0,1)

(\ ~[x, (a,b)] -> x) [(0,1),⊥]⇒⊥

(\ (x:xs) -> x:x:xs)⊥⇒⊥

(\ ~(x:xs) -> x:x:xs)⊥⇒⊥:⊥:⊥

Consider the following declarations:

newtype N = N Bool
data D = D !Bool

These examples illustrate the difference in pattern matching between types defined by data and
newtype:

Top level patterns in case expressions and the set of top level patterns in function or pattern bindings may have zero
or more associated guards. See Section 3.13 for the syntax and semantics of guards.

The guard semantics have an influence on the strictness characteristics of a function or case expression.
In particular, an otherwise irrefutable pattern may be evaluated because of a guard. For example, in

f :: (Int,Int,Int) -> [Int] -> Int
f ~(x,y,z) [a] | (a == y) = 1

both a and y will be evaluated by == in the guard.

3.17.3 Formal Semantics of Pattern Matching

The semantics of all pattern matching constructs other than case expressions are defined by giving identities that
relate those constructs to case expressions. The semantics of case expressions themselves are in turn
given as a series of identities, in Figures 3.1 –3.3 . Any implementation should behave so that these
identities hold; it is not expected that it will use them directly, since that would generate rather inefficient
code.

(a)

casee of {alts }= (\v -> casev of {alts })e

where v is a new variable

(b)

casev of {p1 match1;… ;pn matchn }

= casev of {p1 match1 ;

_ ->… casev of {

pn matchn;

_ -> error "No match" }…}

where each matchihas the form:

|gsi,1 ->ei,1 ;… ; |gsi,mi ->ei,mi where {declsi }

(c)

casev of {p |gs1 ->e1 ;…

|gsn ->en where {decls }

_ ->e′ }

= casee′ of {y ->

casev of {

p -> let {decls } in

case () of {

() |gs1 ->e1;

_ ->… case () of {

() |gsn ->en;

_ ->y }… }

_ ->y }}

where y is a new variable

(d)

casev of { ~p ->e; _ ->e′ }

= (\x1…xn->e) (casev of {p->x1 })…(casev of {p ->xn})

where x1,…,xnare all the variables in p

(e)

casev of {x@p ->e; _ ->e′ }

= casev of {p -> ( \x ->e )v ; _ ->e′ }

(f)

casev of { _ ->e; _ ->e′ }=e

Figure 3.1: Semantics of Case Expressions, Part 1

(g)

casev of {K p1…pn ->e; _ ->e′ }

= casev of {

K x1…xn -> casex1 of {

p1 ->… casexn of {pn ->e ; _ ->e′ }…

_ ->e′ }

_ ->e′ }

at least one of p1,…,pnis not a variable; x1,…,xnare new variables

(h)

casev of {k ->e; _ ->e′ }= if (v==k) thene elsee′

where k is a numeric, character, or string literal

(i)

casev of {x ->e; _ ->e′ }= casev of {x ->e }

(j)

casev of {x ->e }= ( \x ->e )v

(k)

caseN v of {Np ->e; _ ->e′ }

= casev of {p ->e; _ ->e′ }

where N is anewtypeconstructor

(l)

case⊥ of {Np ->e; _ ->e′ }= case⊥ of {p ->e }

where N is anewtypeconstructor

(m)

casev of {K {f1 =p1 ,f2 =p2 ,…} ->e; _ ->e′ }

= casee′ of {

y ->

casev of {

K {f1 =p1 } ->

casev of {K {f2 =p2 ,… } ->e; _ ->y };

_ ->y }}

where f1, f2,…are fields of constructor K; y is a new variable

(n)

casev of {K {f =p} ->e; _ ->e′ }

= casev of {

K p1… pn ->e; _ ->e′ }

where piis p if f labels the ith component of K,_otherwise

(o)

casev of {K {} ->e; _ ->e′ }

= casev of {

K_…_ ->e; _ ->e′ }

(p)

case (K′e1…em) of {Kx1…xn ->e; _ ->e′ }=e′

where K and K′ are distinctdataconstructors of arity n and m, respectively

(q)

case (Ke1…en) of {Kx1…xn ->e; _ ->e′ }

= (\x1… xn ->e)e1… en

where K is adataconstructor of arity n

(r)

case⊥of {Kx1…xn ->e; _ ->e′ }=⊥

where K is adataconstructor of arity n

Figure 3.2: Semantics of Case Expressions, Part 2

(s)

case () of { () |g1,…,gn ->e; _ ->e′ }

= case () of {

() |g1 ->… case () of {

() |gn ->e;

_ ->e′ }…

_ ->e′ }

where y is a new variable

(t)

case () of { () |p <-e0 ->e; _ ->e′ }

= casee0 of {p ->e; _ ->e′ }

(u)

case () of { () | letdecls ->e; _ ->e′ }

= letdecls ine

(v)

case () of { () |e0 ->e; _ ->e′ }

= ife0 thene elsee′

Figure 3.3: Semantics of Case Expressions, Part 3

In Figures 3.1 –3.3 : e, e′ and ei are expressions; gi and gsi are guards and sequences of guards respecively; p and pi
are patterns; v, x, and xi are variables; K and K′ are algebraic datatype (data) constructors (including tuple
constructors); and N is a newtype constructor.

Rule (b) matches a general source-language case expression, regardless of whether it actually includes guards—if
no guards are written, then True is substituted for the guards gsi,j in the matchi forms. Subsequent identities
manipulate the resulting case expression into simpler and simpler forms.

Rule (h) in Figure 3.2 involves the overloaded operator ==; it is this rule that defines the meaning of pattern
matching against overloaded constants.

These identities all preserve the static semantics. Rules (d), (e), (j), and (q) use a lambda rather than a let; this
indicates that variables bound by case are monomorphically typed (Section 4.1.4 ).