Haskell/Denotational semantics

New readers: Please report stumbling blocks! While the material on this page is intended to explain clearly, there are always mental traps that innocent readers new to the subject fall in but that the authors are not aware of. Please report any tricky passages to the Talk page or the #haskell IRC channel so that the style of exposition can be improved.

This chapter explains how to formalize the meaning of Haskell programs, the denotational semantics. It may seem to be nit-picking to formally specify that the program square x = x*x means the same as the mathematical square function that maps each number to its square, but what about the meaning of a program like f x = f (x+1) that loops forever? In the following, we will exemplify the approach first taken by Scott and Strachey to this question and obtain a foundation to reason about the correctness of functional programs in general and recursive definitions in particular. Of course, we will concentrate on those topics needed to understand Haskell programs.[1]

Another aim of this chapter is to illustrate the notions strict and lazy that capture the idea that a function needs or needs not to evaluate its argument. This is a basic ingredient to predict the course of evaluation of Haskell programs and hence of primary interest to the programmer. Interestingly, these notions can be formulated concisely with denotational semantics alone, no reference to an execution model is necessary. They will be put to good use in Graph Reduction, but it is this chapter that will familiarize the reader with the denotational definition and involved notions such as ⊥ ("Bottom"). The reader only interested in strictness may wish to poke around in section Bottom and Partial Functions and quickly head over to Strict and Non-Strict Semantics.

What does a Haskell program mean? This question is answered by the denotational semantics of Haskell. In general, the denotational semantics of a programming language map each of its programs to a mathematical object (denotation), that represents the meaning of the program in question. As an example, the mathematical object for the Haskell programs 10, 9+1, 2*5 and sum [1..4] can be represented by the integer 10. We say that all those programs denote the integer 10. The collection of such mathematical objects is called the semantic domain.

The mapping from program code to a semantic domain is commonly written down with double square brackets ("Oxford brackets") around program code. For example,

Denotations are compositional, i.e. the meaning of a program like 1+9 only depends on the meaning of its constituents:

The same notation is used for types, i.e.

For simplicity however, we will silently identify expressions with their semantic objects in subsequent chapters and use this notation only when clarification is needed.

It is one of the key properties of purely functional languages like Haskell that a direct mathematical interpretation like "1+9 denotes 10" carries over to functions, too: in essence, the denotation of a program of type Integer -> Integer is a mathematical function between integers. While we will see that this expression needs refinement generally, to include non-termination, the situation for imperative languages is clearly worse: a procedure with that type denotes something that changes the state of a machine in possibly unintended ways. Imperative languages are tightly tied to operational semantics which describes their way of execution on a machine. It is possible to define a denotational semantics for imperative programs and to use it to reason about such programs, but the semantics often has operational nature and sometimes must be extended in comparison to the denotational semantics for functional languages.[2] In contrast, the meaning of purely functional languages is by default completely independent from their way of execution. The Haskell98 standard even goes as far as to specify only Haskell's non-strict denotational semantics, leaving open how to implement them.

In the end, denotational semantics enables us to develop formal proofs that programs indeed do what we want them to do mathematically. Ironically, for proving program properties in day-to-day Haskell, one can use Equational reasoning, which transforms programs into equivalent ones without seeing much of the underlying mathematical objects we are concentrating on in this chapter. But the denotational semantics actually show up whenever we have to reason about non-terminating programs, for instance in Infinite Lists.

Of course, because they only state what a program is, denotational semantics cannot answer questions about how long a program takes or how much memory it eats; this is governed by the evaluation strategy which dictates how the computer calculates the normal form of an expression. On the other hand, the implementation has to respect the semantics, and to a certain extent, it is the semantics that determine how Haskell programs must be evaluated on a machine. We will elaborate on this in Strict and Non-Strict Semantics.

We are now looking for suitable mathematical objects that we can attribute to every Haskell program. In case of the example 10, 2*5 and sum [1..4], it is clear that all expressions should denote the integer 10. Generalizing, every value x of type Integer is likely to denote an element of the set . The same can be done with values of type Bool. For functions like f :: Integer -> Integer, we can appeal to the mathematical definition of "function" as a set of (argument,value)-pairs, its graph.

But interpreting functions as their graph was too quick, because it does not work well with recursive definitions. Consider the definition

We can think of 0,1 and 2 as being male persons with long beards and the question is who shaves whom. Person 1 shaves himself, but 2 gets shaved by the barber 0 because evaluating the third equation yields 0 `shaves` 2 == True. In general, the third line says that the barber 0 shaves all persons that do not shave themselves.

What about the barber himself, is 0 `shaves` 0 true or not? If it is, then the third equation says that it is not. If it is not, then the third equation says that it is. Puzzled, we see that we just cannot attribute True or False to 0 `shaves` 0, the graph we use as interpretation for the function shaves must have an empty spot. We realize that our semantic objects must be able to incorporate partial functions, functions that are undefined for some arguments.

It is well known that this famous example gave rise to serious foundational problems in set theory. It's an example of an impredicative definition, a definition which uses itself, a logical circle. Unfortunately for recursive definitions, the circle is not the problem but the feature.

To define partial functions, we introduce a special value ⊥, named bottom and commonly written _|_ in typewriter font. We say that ⊥ is the completely "undefined" value or function. Every basic data type like Integer or () contains one ⊥ besides their usual elements. So the possible values of type Integer are

Adding ⊥ to the set of values is also called lifting. This is often depicted by a subscript like in . While this is the correct notation for the mathematical set "lifted integers", we prefer to talk about "values of type Integer". We do this because suggests that there are "real" integers , but inside Haskell, the "integers" are Integer.

As another example, the type () with only one element actually has two inhabitants:

For now, we will stick to programming with Integers. Arbitrary algebraic data types will be treated in section Algebraic Data Types as strict and non-strict languages diverge on how these include ⊥.

In Haskell, the expression undefined denotes ⊥. With its help, one can indeed verify some semantic properties of actual Haskell programs. undefined has the polymorphic type forall a . a which of course can be specialized to undefined :: Integer, undefined :: (), undefined :: Integer -> Integer and so on. In the Haskell Prelude, it is defined as

Here, yields well defined values for and but gives for all other . Note that the type is universal, as has no value: the function :: Integer -> Integer is given by

for all

where the on the right hand side denotes a value of type Integer.

To formalize, partial functions say, of type Integer -> Integer are at least mathematical mappings from the lifted integers to the lifted integers. But this is not enough, since it does not acknowledge the special role of . For example, the definition

looks counterintuitive, and, in fact, is wrong. Why does yield a defined value whereas is undefined? The intuition is that every partial function should yield more defined answers for more defined arguments. To formalize, we can say that every concrete number is more defined than :

Here, denotes that is more defined than . Likewise, will denote that either is more defined than or both are equal (and so have the same definedness). is also called the semantic approximation order because we can approximate defined values by less defined ones thus interpreting "more defined" as "approximating better". Of course, is designed to be the least element of a data type, we always have that for all , except the case when happens to denote itself:

As no number is more defined than another, the mathematical relation is false for any pair of numbers:

does not hold.

neither nor hold.

This is contrasted to ordinary order predicate , which can compare any two numbers. A quick way to remember this is the sentence: " and are different in terms of information content but are equal in terms of information quantity". That's another reason why we use a different symbol: .

neither nor hold,

but holds.

One says that specifies a partial order and that the values of type Integer form a partially ordered set (poset for short). A partial order is characterized by the following three laws

Reflexivity, everything is just as defined as itself: for all

Transitivity: if and , then

Antisymmetry: if both and hold, then and must be equal: .

Exercises

Do the integers form a poset with respect to the order ?

We can depict the order on the values of type Integer by the following graph

where every link between two nodes specifies that the one above is more defined than the one below. Because there is only one level (excluding ), one says that Integer is a flat domain. The picture also explains the name of : it's called bottom because it always sits at the bottom.

Our intuition about partial functions now can be formulated as following: every partial function is a monotone mapping between partially ordered sets. More defined arguments will yield more defined values:

In particular, a function with is constant: for all . Note that here it is crucial that etc. don't hold.

Translated to Haskell, monotonicity means that we cannot use as a condition, i.e. we cannot pattern match on , or its equivalent undefined. Otherwise, the example from above could be expressed as a Haskell program. As we shall see later, also denotes non-terminating programs, so that the inability to observe inside Haskell is related to the halting problem.

Of course, the notion of more defined than can be extended to partial functions by saying that a function is more defined than another if it is so at every possible argument:

Thus, the partial functions also form a poset, with the undefined function being the least element.

Now that we have a means to describe partial functions, we can give an interpretation to recursive definitions. Lets take the prominent example of the factorial function whose recursive definition is

Although we saw that interpreting this recursive function directly as a set description may lead to problems, we intuitively know that in order to calculate for every given we have to iterate the right hand side. This iteration can be formalized as follows: we calculate a sequence of functions with the property that each one consists of the right hand side applied to the previous one, that is

We start with the undefined function , and the resulting sequence of partial functions reads:

and so on. Clearly,

and we expect that the sequence converges to the factorial function.

The iteration follows the well known scheme of a fixed point iteration

In our case, is a function and is a functional, a mapping between functions. We have

and

If we start with , the iteration will yield increasingly defined approximations to the factorial function

(Proof that the sequence increases: The first inequality follows from the fact that is less defined than anything else. The second inequality follows from the first one by applying to both sides and noting that is monotone. The third follows from the second in the same fashion and so on.)

It is very illustrative to formulate this iteration scheme in Haskell. As functionals are just ordinary higher order functions, we have

To the mathematician, the question whether this sequence of approximations converges is still to be answered. For that, we say that a poset is a directed complete partial order (dcpo) iff every monotone sequence (also called chain) has a least upper bound (supremum)

.

If that's the case for the semantic approximation order, we clearly can be sure that monotone sequence of functions approximating the factorial function indeed has a limit. For our denotational semantics, we will only meet dcpos which have a least element which are called complete partial orders (cpo).

The Integers clearly form a (d)cpo, because the monotone sequences consisting of more than one element must be of the form

where is an ordinary number. Thus, is already the least upper bound.

For functions Integer -> Integer, this argument fails because monotone sequences may be of infinite length. But because Integer is a (d)cpo, we know that for every point , there is a least upper bound

.

As the semantic approximation order is defined point-wise, the function is the supremum we looked for.

These have been the last touches for our aim to transform the impredicative definition of the factorial function into a well defined construction. Of course, it remains to be shown that actually yields a defined value for every , but this is not hard and far more reasonable than a completely ill-formed definition.

It is instructive to try our newly gained insight into recursive definitions on an example that does not terminate:

The approximating sequence reads

and consists only of . Clearly, the resulting limit is again. From an operational point of view, a machine executing this program will loop indefinitely. We thus see that may also denote a non-terminating function or value. Hence, given the halting problem, pattern matching on in Haskell is impossible.

Earlier, we called the approximating sequence an example of the well known "fixed point iteration" scheme. And of course, the definition of the factorial function can also be thought as the specification of a fixed point of the functional :

However, there might be multiple fixed points. For instance, there are several which fulfill the specification

,

Of course, when executing such a program, the machine will loop forever on or and thus not produce any valuable information about the value of . This corresponds to choosing the least defined fixed point as semantic object and this is indeed a canonical choice. Thus, we say that

,

defines the least fixed point of . Clearly, least is with respect to our semantic approximation order .

The existence of a least fixed point is guaranteed by our iterative construction if we add the condition that must be continuous (sometimes also called "chain continuous"). That simply means that respects suprema of monotone sequences:

We can then argue that with

we have

and the iteration limit is indeed a fixed point of . You may also want to convince yourself that the fixed point iteration yields the least fixed point possible.

Exercises

Prove that the fixed point obtained by fixed point iteration starting with is also the least one, that it is smaller than any other fixed point. (Hint: is the least element of our cpo and is monotone)

By the way, how do we know that each Haskell function we write down indeed is continuous? Just as with monotonicity, this has to be enforced by the programming language. Admittedly, these properties can somewhat be enforced or broken at will, so the question feels a bit void. But intuitively, monotonicity is guaranteed by not allowing pattern matches on . For continuity, we note that for an arbitrary type a, every simple function a -> Integer is automatically continuous because the monotone sequences of Integers are of finite length. Any infinite chain of values of type a gets mapped to a finite chain of Integers and respect for suprema becomes a consequence of monotonicity. Thus, all functions of the special case Integer -> Integer must be continuous. For functionals like ::(Integer -> Integer) -> (Integer -> Integer), the continuity then materializes due to currying, as the type is isomorphic to ::((Integer -> Integer), Integer) -> Integer and we can take a=((Integer -> Integer), Integer).

In Haskell, the fixed interpretation of the factorial function can be coded as

factorial = fix g

with the help of the fixed point combinator

fix :: (a -> a) -> a.

We can define it by

fix f = let x = f x in x

which leaves us somewhat puzzled because when expanding , the result is not anything different from how we would have defined the factorial function in Haskell in the first place. But of course, the construction this whole section was about is not at all present when running a real Haskell program. It's just a means to put the mathematical interpretation of Haskell programs on a firm ground. Yet it is very nice that we can explore these semantics in Haskell itself with the help of undefined.

and there is nothing unexpected about them. But why are they strict? It is instructive to prove that these functions are indeed strict. For id, this follows from the definition. For succ, we have to ponder whether ⊥ + 1 is ⊥ or not. If it was not, then we should for example have ⊥ + 1 = 2 or more general ⊥ + 1 = k for some concrete number k. We remember that every function is monotone, so we should have for example

2 = ⊥ + 1 ⊑ 4 + 1 = 5

as ⊥ ⊑ 4. But neither of 2 ⊑ 5, 2 = 5 nor 2 ⊒ 5 is valid so that k cannot be 2. In general, we obtain the contradiction

k = ⊥ + 1 ⊑ k + 1 = k + 1.

and thus the only possible choice is

succ ⊥ = ⊥ + 1 = ⊥

and succ is strict.

Exercises

Prove that power2 is strict. While one can base the proof on the "obvious" fact that power2 n is , the latter is preferably proven using fixed point iteration.

Searching for non-strict functions, it happens that there is only one prototype of a non-strict function of type Integer -> Integer:

one x = 1

Its variants are constk x = k for every concrete number k. Why are these the only ones possible? Remember that one n can be no less defined than one ⊥. As Integer is a flat domain, both must be equal.

Why is one non-strict? To see that it is, we use a Haskell interpreter and try

> one (undefined :: Integer)
1

which is not ⊥. This is reasonable as one completely ignores its argument. When interpreting ⊥ in an operational sense as "non-termination", one may say that the non-strictness of one means that it does not force its argument to be evaluated and therefore avoids the infinite loop when evaluating the argument ⊥. But one might as well say that every function must evaluate its arguments before computing the result which means that one ⊥ should be ⊥, too. That is, if the program computing the argument does not halt, one should not halt as well.[3] It shows up that one can choose freely this or the other design for a functional programming language. One says that the language is strict or non-strict depending on whether functions are strict or non-strict by default. The choice for Haskell is non-strict. In contrast, the functional languages ML and Lisp choose strict semantics.

The notion of strictness extends to functions with several variables. For example, a function f of two arguments is strict in the second argument if and only if

f x ⊥ = ⊥

for every x. But for multiple arguments, mixed forms where the strictness depends on the given value of the other arguments, are much more common. An example is the conditional

cond b x y = if b then x else y

We see that it is strict in y depending on whether the test b is True or False:

cond True x ⊥ = x
cond False x ⊥ = ⊥

and likewise for x. Apparently, cond is certainly ⊥ if both x and y are, but not necessarily when at least one of them is defined. This behavior is called joint strictness.

Clearly, cond behaves like the if-then-else statement where it is crucial not to evaluate both the then and the else branches:

if null xs then 'a' else head xs
if n == 0 then 1 else 5 / n

Here, the else part is ⊥ when the condition is met. Thus, in a non-strict language, we have the possibility to wrap primitive control statements such as if-then-else into functions like cond. This way, we can define our own control operators. In a strict language, this is not possible as both branches will be evaluated when calling cond which makes it rather useless. This is a glimpse of the general observation that non-strictness offers more flexibility for code reuse than strictness. See the chapter Laziness[4] for more on this subject.

After treating the motivation case of partial functions between Integers, we now want to extend the scope of denotational semantics to arbitrary algebraic data types in Haskell.

A word about nomenclature: the collection of semantic objects for a particular type is usually called a domain. This term is more a generic name than a particular definition and we decide that our domains are cpos (complete partial orders), that is sets of values together with a relation more defined that obeys some conditions to allow fixed point iteration. Usually, one adds additional conditions to the cpos that ensure that the values of our domains can be represented in some finite way on a computer and thereby avoiding to ponder the twisted ways of uncountable infinite sets. But as we are not going to prove general domain theoretic theorems, the conditions will just happen to hold by construction.

Here, True, False and Nothing are nullary constructors whereas Just is a unary constructor. The inhabitants of Bool form the following domain:

Remember that ⊥ is added as least element to the set of values True and False, we say that the type is lifted[5]. A domain whose poset diagram consists of only one level is called a flat domain. We already know that is a flat domain as well, it's just that the level above ⊥ has an infinite number of elements.

What are the possible inhabitants of Maybe Bool? They are

⊥, Nothing, Just ⊥, Just True, Just False

So the general rule is to insert all possible values into the unary (binary, ternary, ...) constructors as usual but without forgetting ⊥. Concerning the partial order, we remember the condition that the constructors should be monotone just as any other functions. Hence, the partial order looks as follows

But there is something to ponder: why isn't Just ⊥ = ⊥? I mean "Just undefined" is as undefined as "undefined"! The answer is that this depends on whether the language is strict or non-strict. In a strict language, all constructors are strict by default, i.e. Just ⊥ = ⊥ and the diagram would reduce to

As a consequence, all domains of a strict language are flat.

But in a non-strict language like Haskell, constructors are non-strict by default and Just ⊥ is a new element different from ⊥, because we can write a function that reacts differently to them:

f (Just _) = 4
f Nothing = 7

As f ignores the contents of the Just constructor, f (Just ⊥) is 4 but f ⊥ is ⊥ (intuitively, if f is passed ⊥, it will not be possible to tell whether to take the Just branch or the Nothing branch, and so ⊥ will be returned).

This gives rise to non-flat domains as depicted in the former graph. What should these be of use for? In the context of Graph Reduction, we may also think of ⊥ as an unevaluated expression. Thus, a value x = Just ⊥ may tell us that a computation (say a lookup) succeeded and is not Nothing, but that the true value has not been evaluated yet. If we are only interested in whether x succeeded or not, this actually saves us from the unnecessary work to calculate whether x is Just True or Just False as would be the case in a flat domain. The full impact of non-flat domains will be explored in the chapter Laziness, but one prominent example are infinite lists treated in section Recursive Data Types and Infinite Lists.

In the section Strict Functions, we proved that some functions are strict by inspecting their results on different inputs and insisting on monotonicity. However, in the light of algebraic data types, there can only be one source of strictness in real life Haskell: pattern matching, i.e. case expressions. The general rule is that pattern matching on a constructor of a data-type will force the function to be strict, i.e. matching ⊥ against a constructor always gives ⊥. For illustration, consider

const1 _ = 1

const1' True = 1
const1' False = 1

The first function const1 is non-strict whereas the const1' is strict because it decides whether the argument is True or False although its result doesn't depend on that. Pattern matching in function arguments is equivalent to case-expressions

const1' x = case x of
True -> 1
False -> 1

which similarly impose strictness on x: if the argument to the case expression denotes ⊥ the whole case will denote ⊥, too. However, the argument for case expressions may be more involved as in

and it can be difficult to track what this means for the strictness of foo.

An example for multiple pattern matches in the equational style is the logical or:

or True _ = True
or _ True = True
or _ _ = False

Note that equations are matched from top to bottom. The first equation for or matches the first argument against True, so or is strict in its first argument. The same equation also tells us that or True x is non-strict in x. If the first argument is False, then the second will be matched against True and or False x is strict in x. Note that while wildcards are a general sign of non-strictness, this depends on their position with respect to the pattern matches against constructors.

Exercises

Give an equivalent discussion for the logical and

Can the logical "excluded or" (xor) be non-strict in one of its arguments if we know the other?

There is another form of pattern matching, namely irrefutable patterns marked with a tilde ~. Their use is demonstrated by

f ~(Just x) = 1
f Nothing = 2

An irrefutable pattern always succeeds (hence the name) resulting in f ⊥ = 1. But when changing the definition of f to

f ~(Just x) = x + 1
f Nothing = 2 -- this line may as well be left away

we have

f ⊥ = ⊥ + 1 = ⊥
f (Just 1) = 1 + 1 = 2

If the argument matches the pattern, x will be bound to the corresponding value. Otherwise, any variable like x will be bound to ⊥.

This function is another example of joint strictness, but a much sharper one: the result is only ⊥ if both arguments are (at least when we restrict the arguments to True and ⊥). Can such a function be implemented in Haskell?

The case of recursive data structures is not very different from the base case. Consider a list of unit values

data List = [] | () : List

Though this seems like a simple type, there is a surprisingly complicated number of ways you can fit in here and there, and therefore the corresponding graph is complicated. The bottom of this graph is shown below. An ellipsis indicates that the graph continues along this direction. A red ellipse behind an element indicates that this is the end of a chain; the element is in normal form.

and so on. But now, there are also chains of infinite length like

⊥ ():⊥ ():():⊥ ...

This causes us some trouble as we noted in section Convergence that every monotone sequence must have a least upper bound. This is only possible if we allow for infinite lists. Infinite lists (sometimes also called streams) turn out to be very useful and their manifold use cases are treated in full detail in chapter Laziness. Here, we will show what their denotational semantics should be and how to reason about them. Note that while the following discussion is restricted to lists only, it easily generalizes to arbitrary recursive data structures like trees.

In the following, we will switch back to the standard list type

data [a] = [] | a : [a]

to close the syntactic gap to practical programming with infinite lists in Haskell.

Exercises

Draw the non-flat domain corresponding [Bool].

How is the graphic to be changed for [Integer]?

Calculating with infinite lists is best shown by example. For that, we need an infinite list

ones :: [Integer]
ones = 1 : ones

When applying the fixed point iteration to this recursive definition, we see that ones ought to be the supremum of

⊥ 1:⊥ 1:1:⊥ 1:1:1:⊥ ...,

that is an infinite list of 1. Let's try to understand what take 2 ones should be. With the definition of take

We see that take 2 (1:1:1:⊥) and so on must be the same as take 2 (1:1:⊥) = 1:1:[] because 1:1:[] is fully defined. Taking the supremum on both the sequence of input lists and the resulting sequence of output lists, we can conclude

take 2 ones = 1:1:[]

Thus, taking the first two elements of ones behaves exactly as expected.

Generalizing from the example, we see that reasoning about infinite lists involves considering the approximating sequence and passing to the supremum, the truly infinite list. Still, we did not give it a firm ground. The solution is to identify the infinite list with the whole chain itself and to formally add it as a new element to our domain: the infinite list is the sequence of its approximations. Of course, any infinite list like ones can be compactly depicted as

ones = 1 : 1 : 1 : 1 : ...

what simply means that

ones = (⊥ 1:⊥ 1:1:⊥ ...)

Exercises

Of course, there are more interesting infinite lists than ones. Can you write recursive definition in Haskell for

the natural numbers nats = 1:2:3:4:...

a cycle like cycle123 = 1:2:3: 1:2:3 : ...

Look at the Prelude functions repeat and iterate and try to solve the previous exercise with their help.

Use the example from the text to find the value the expression drop 3 nats denotes.

Assume that the work in a strict setting, i.e. that the domain of [Integer] is flat. What does the domain look like? What about infinite lists? What value does ones denote?

What about the puzzle of how a computer can calculate with infinite lists? It takes an infinite amount of time, after all? Well, this is true. But the trick is that the computer may well finish in a finite amount of time if it only considers a finite part of the infinite list. So, infinite lists should be thought of as potentially infinite lists. In general, intermediate results take the form of infinite lists whereas the final value is finite. It is one of the benefits of denotational semantics that one treat the intermediate infinite data structures as truly infinite when reasoning about program correctness.

Exercises

To demonstrate the use of infinite lists as intermediate results, show that

take 3 (map (+1) nats) = take 3 (tail nats)

by first calculating the infinite sequence corresponding to map (+1) nats.

Of course, we should give an example where the final result indeed takes an infinite time. So, what does

filter (< 5) nats

denote?

Sometimes, one can replace filter with takeWhile in the previous exercise. Why only sometimes and what happens if one does?

As a last note, the construction of a recursive domain can be done by a fixed point iteration similar to recursive definition for functions. Yet, the problem of infinite chains has to be tackled explicitly. See the literature in External Links for a formal construction.

Haskell offers a way to change the default non-strict behavior of data type constructors by strictness annotations. In a data declaration like

data Maybe' a = Just' !a | Nothing'

an exclamation point ! before an argument of the constructor specifies that it should be strict in this argument. Hence we have Just' ⊥ = ⊥ in our example. Further information may be found in chapter Strictness.

In some cases, one wants to rename a data type, like in

data Couldbe a = Couldbe (Maybe a)

However, Couldbe a contains both the elements ⊥ and Couldbe ⊥. With the help of a newtype definition

newtype Couldbe a = Couldbe (Maybe a)

we can arrange that Couldbe a is semantically equal to Maybe a, but different during type checking. In particular, the constructor Couldbe is strict. Yet, this definition is subtly different from

data Couldbe' a = Couldbe' !(Maybe a)

To explain how, consider the functions

f (Couldbe m) = 42
f' (Couldbe' m) = 42

Here, f' ⊥ will cause the pattern match on the constructor Couldbe' fail with the effect that f' ⊥ = ⊥. But for the newtype, the match on Couldbe will never fail, we get f ⊥ = 42. In a sense, the difference can be stated as:

for the strict case, Couldbe' ⊥ is a synonym for ⊥

for the newtype, ⊥ is a synonym for Couldbe ⊥

with the agreement that a pattern match on ⊥ fails and that a match on Constructor ⊥ does not.

Newtypes may also be used to define recursive types. An example is the alternate definition of the list type [a]

newtype List a = In (Maybe (a, List a))

Again, the point is that the constructor In does not introduce an additional lifting with ⊥.

As lazy evaluation means a constant computational overhead, a Haskell compiler may want to discover where inherent non-strictness is not needed at all which allows it to drop the overhead at these particular places. To that extent, the compiler performs strictness analysis just like we proved in some functions to be strict section Strict Functions. Of course, details of strictness depending on the exact values of arguments like in our example cond are out of scope (this is in general undecidable). But the compiler may try to find approximate strictness information and this works in many common cases like power2.

Now, abstract interpretation is a formidable idea to reason about strictness: ...

So far, we have introduced ⊥ and the semantic approximation order abstractly by specifying their properties. However, both as well as any inhabitants of a data type like Just ⊥ can be interpreted as ordinary sets. This is called the powerset construction. NOTE: i'm not sure whether this is really true. Someone how knows, please correct this.

The idea is to think of ⊥ as the set of all possible values and that a computation retrieves more information this by choosing a subset. In a sense, the denotation of a value starts its life as the set of all values which will be reduced by computations until there remains a set with a single element only.

As an example, consider Bool where the domain looks like

{True} {False}
\ /
\ /
⊥ = {True, False}

The values True and False are encoded as the singleton sets {True} and {False} and ⊥ is the set of all possible values.

In the section What to choose as Semantic Domain?, we argued that taking simple sets as denotation for types doesn't work well with partial functions. In the light of recursive data types, things become even worse as John C. Reynolds showed in his paper Polymorphism is not set-theoretic[7].

Reynolds actually considers the recursive type

newtype U = In ((U -> Bool) -> Bool)

Interpreting Bool as the set {True,False} and the function type A -> B as the set of functions from A to B, the type U cannot denote a set. This is because (A -> Bool) is the set of subsets (powerset) of A which, due to a diagonal argument analogous to Cantor's argument that there are "more" real numbers than natural ones, always has a bigger cardinality than A. Thus, (U -> Bool) -> Bool has an even bigger cardinality than U and there is no way for it to be isomorphic to U. Hence, the set U must not exist, a contradiction.

In our world of partial functions, this argument fails. Here, an element of U is given by a sequence of approximations taken from the sequence of domains

where ⊥ denotes the domain with the single inhabitant ⊥. While the author of this text admittedly has no clue on what such a thing should mean, the constructor gives a perfectly well defined object for U. We see that the type (U -> Bool) -> Bool merely consists of shifted approximating sequences which means that it is isomorphic to U.

As a last note, Reynolds actually constructs an equivalent of U in the second order polymorphic lambda calculus. There, it happens that all terms have a normal form, i.e. there are only total functions when we do not include a primitive recursion operator fix :: (a -> a) -> a. Thus, there is no true need for partial functions and ⊥, yet a naïve set theoretic semantics fails. We can only speculate that this has to do with the fact that not every mathematical function is computable. In particular, the set of computable functions A -> Bool should not have a bigger cardinality than A.

Notes

↑In fact, there are no written down and complete denotational semantics of Haskell. This would be a tedious task void of additional insight and we happily embrace the folklore and common sense semantics.

↑Monads are one of the most successful ways to give denotational semantics to imperative programs. See also Haskell/Advanced monads.

↑Strictness as premature evaluation of function arguments is elaborated in the chapter Graph Reduction.

↑The term Laziness comes from the fact that the prevalent implementation technique for non-strict languages is called lazy evaluation