Latest revision as of 12:03, 21 October 2012

One of Haskell's main features is non-strict semantics, which is implemented by lazy evaluation in all popular Haskell compilers.
However many Haskell libraries found on Hackage are implemented as if Haskell were a strict language.
This leads to unnecessary inefficiencies, memory leaks and, we suspect, unintended semantics.
In this article we want to go through some techniques on how to check lazy behaviour on functions, examples of typical constructs which break laziness without need, and finally we want to link to techniques that may yield the same effect without laziness.

.
This function cannot be lazy, because when you access the first character of the result, it must already be computed, whether the result is

Left

or

Right

.

For this decision, the complete input must be decoded.
A better type signature is

decodeUTF8 ::[Word8]->(Maybe Message,String)

where the

String

contains as much characters as could be decoded and

Maybe Message

gives the reason for the stop of the decoding.

Nothing

means the input was completely read,

Just msg

means the decoding was aborted for the reason described in

msg

.

If you touch the first element of the pair, the complete decodings is triggered, thus laziness is broken.

This means you should first process the

String

and look at

Maybe Message

afterwards.

Instead of the unspecific pair type you should use the special type for asynchronous exceptions as found in the explicit exception package.

Especially in parsers you may find a function, called Wadler's force function.
It works as follows:

force y =let Just x = y
in Just x

It looks like a complicated expression for

y

with an added danger of failing unrecoverably when

y

is not

Just

.
Its purpose is to use the lazy pattern matching of

let

and to show to the runtime system, that we expect that

y

is always a

Just

.

Then the runtime system does not need to wait until it can determine the right constructor but it can proceed immediately.

This way, a function can be made lazy, also if it returns

Maybe

.
It can however fail, if later it turns out, that

y

is actually

Nothing

.

Using force-like functions is sometimes necessary, but should be avoided for data types with more than one constructor.
It is better to use an interim data type with one constructor and lift to the multi-constructor datatype when needed.

Consider parsers of type

StateT [Word8]Maybe a

.
Now consider the parser combinator

many :: StateT [Word8]Maybe a -> StateT [Word8]Maybe[a]

which parses as many elements of type

a

as possible.
It shall be lazy and thus must be infallible and must not use the

It is common source of too much strictness to make decisions too early and thus duplicate code in the decision branches.
Intuitively spoken, the bad thing about code duplication (stylistic questions put aside) is, that the run-time system cannot see that in the branches, some things are equal and do it in common before the critical decision.
Actually, the compiler and run-time system could be "improved" to do so, but in order to keep things predictable, they do not do so.
Even more, this behaviour is required by theory, since by pushing decisions to the inner of an expression you change the semantics of the expression.
So we return to the question, what the programmer actually wants.

Now, do you think this expression

if b
then[x]else y:ys

is maximally lazy?
It seems so, but actually it is not. In both branches we create non-empty lists, but the run-time system cannot see this.

It is

null(ifundefinedthen[x]else y:ys)

again

undefined

, but we like to have it evaluated to

False

.
Here we need lazy pattern matching as provided by

let

.

let z:zs =if b
then[x]else y:ys
in z:zs

This expression always returns the constructor

(:)

and thus

null

knows that the list is not empty.
However, this is a little bit unsafe, because the

let z:zs

may fail if in the branches of

if

there is an empty list.

This error can only caught at run-time which is bad.
We can avoid it using the single constructor pair type.

I do not know whether the following example can be simplified.
In this form it occurred in a real application, namely the HTTP package.

Consider the following action of the

Control.Monad.RWS

which fetches a certain number of elements from a list.

The state of the monad is the input list we fetch the elements from.
The reader part provides an element which means that the input is consumed.
It is returned as singleton when the caller tries to read from a completely read input.
The writer allows to log some information, however the considered action does not output anything to the log.

We learn from this example, that sometimes in Haskell it is more efficient to call functions that are not needed under some circumstances.
Always remind, that the do notation looks only imperative, but it is not imperative.

E.g.,

endOfInput

is only evaluated if the end of the input is really reached.
Thus, the call

In general functions, output of lazily generated data is no problem,
whereas lazily reading data requires a sort of a hack and thus caution.
Consider the nice program

readFile"source">>=writeFile"target"

which copies the file source to the file target with constant memory consumption, since

readFile

reads the data lazily and

writeFile

writes it as it comes in.

However it fails badly, when a file shall be updated in-place:

readFile"text">>=writeFile"text".map toUpper

This would work only when

readFile

would be strict, that is it would read the file contents to memory before returning.
The function

readFile

needs certain hacks:

The function

unsafeInterleaveIO

is needed for deferring the calls to

hGetChar

until the characters are actually needed.

Exceptions, that occur while reading the file, are raised in the code that writes the result of processing the file content to somewhere. I.e. the exceptions produced by

readFile

can occur in code that has nothing to do with file reading and there is no warning, that they might occur there. Again, I want to advertise the explicit exception package, which helps making the reason for the stop of the file read explicit. Exceptions must still be handled in code, that does not read the file, but the fact that they are explicit helps you to not forget it.

The file must be closed after it is no longer needed. The documentation says, that the file is put into a semi-closed state. Maybe this means, it uses Weak Reference which lets the garbage collector close the file, once no reference to data of the file exists anymore. However, the garbage collector never works immediately, but in phases. It may be that the file remains open for a long time, maybe until the program exits. The

Data.ByteString.Lazy.readFile

function explicitly closes the file after the last byte is read. The advantage is, that the file is closed immediately. The disadvantage is, that the file is not closed at all, when not all bytes are read. E.g. if a parser encounters a parse error, it has to read the rest of the file anyway, in order to get it closed.

A function that handles the closing of the file for you is

System.IO.withFile

.

You can use it like

withFile "source" ReadMode $ \h ->
hGetLine h >>=putStrLn

After the actions inside the

withFile

call, the file is closed.

However this is dangerous:

If you leak lazily read contents from the file out of

withFile

, the file is closed before the data is actually read.
Thus, although

withFile "source" ReadMode hGetContents

looks like

readFile

,

it is very different: I does not work.

How can you implement a function like

hGetContents

by yourselves?
You need to call

hGetChar

in a lazy way.
This is achieved by

unsafeInterleaveIO

.
However, calling

unsafeInterleaveIO hGetChar

many times would not work,

because the order must be preserved.

E.g. in

hGetContents h >>=putStrLn.drop10

, the first ten characters from the file are not needed,
but

hGetChar

must be called for the first 10 characters anyway in order to increment the file position.
This is achieved by not calling

(by the way, it does even not handle the end of the file), but the advantage of not relying on some automatism to close the file somewhen is, that you can close the file immediately after you stopped processing its content.
The disadvantage is that you must not forget to close the file and must do it only once.

So far we have only considered lazy read.
It might also be necessary to trigger write actions when fetching data.
Consider a server-client interaction, where data can only be read, when a request was sent before.
It would be nice if the request is triggered by reading the result from the server.
Such interactions can be programmed using the lazyio package.

From the above issues, you see that laziness is a fragile thing.
Make one mistake and a function, carefully developed with laziness in mind, is no longer lazy.
The type system will rarely help you hunting laziness breakers, and there is little support by debuggers.

Thus, detecting laziness breakers often requires understanding a large portion of code, which is against the idea of modularity.

Maybe for your case you will prefer a different idiom, that achieves the same goals in a safer way. See e.g. the Enumerator and iteratee pattern.