Preventing memoization in (AI) search problems

In lazy evaluation, an expression is not evaluated unless its result
is needed. Once an expression is evaluated, its result is
remembered, or memoized, in case the value of the same expression is
needed again. This on-demand evaluation and memoization happens
transparently and `for free', sparing us from explicitly programming
these features. Another great attraction of lazy evaluation is the
potential to improve algorithmic complexity: As Bird et al. wrote,
``for some problems at least -- lazy evaluators are strictly more
powerful than eager ones.''

The implicit memoization however is a trade-off. We trade space to
store expression's result for time gained in avoiding recomputations.
The trade-off is particularly worthy if the result takes much less
memory than needed for the closure (thunk) of the expression. This is
often the case with numeric code. Non-deterministic, probabilistic
programming and general AI search problems are the opposite case. A
non-deterministic expression is typically represented as a lazy search
tree, which is often huge even for small expressions. It becomes a
better trade-off to re-evaluate an expression rather than to fill all
memory with results.

Alas, GHC is designed for the opposite trade-off. Therefore, using
Haskell even for simple search problems is quite a challenge since
memoization gets in the way. Preventing the memoization is
surprisingly hard, since GHC is very good at finding the opportunities
for it, even within thunks. This article uses a typical example of
non-deterministic search to illustrate the problem posed by lazy
evaluation and to describe a few tricks to prevent memoization. Some
of them are unexpected.

Our running example computes and prints the first n
elements of the infinite stream of Pythagorean triples pyth,
using three infinite streams of integers from 1. As
typical for non-deterministic programs, the example generates candidate
solutions and rejects most of them.

The interleaving of three infinite streams precludes the List
monad (depth-first search), see below. Instead of a lazy list, we use a search
tree to represent the result of a non-deterministic computation:

data Tree1 a = Fail1 | Val1 a | Node1 (Tree1 a) (Tree1 a)

We rely on the non-strictness of Haskell to prevent evaluation of
tree nodes until we traverse to them: after all, the tree
may be infinite -- as is the case in our example. Tree1 is an
instance of Monad and MonadPlus; here are the most complex
parts of these instances (see the accompanying code for the rest):

Node1 e1 e2 >>= f = Node1 (e1 >>= f) (e2 >>= f)
mplus = Node1

To `run' the non-deterministic computation and produce the the stream
of triples, we traverse the Tree, extract the
successfully produced results from the Val leaves and
return them as a lazy list. Different tree traversals correspond to
different non-deterministic search strategies. Depth-first
traversal (DFS) is the most efficient, needing only O(d)
space to examine a node at depth d. Alas, an infinite
branch in the tree traps DFS. In our pyth tree, DFS will
get stuck chasing an infinite chain of Fail.
Breadth-first traversal (BFS) in contrast shall visit any node in a
tree, given time. BFS is a complete strategy: if a
solution (leaf Val) exists, BFS will find it. Alas, BFS
needs a lot of space to maintain the job queue, the frontier of the
search. At search depth d the frontier may
take O(2^d) space. Iterative deepening is a hybrid
method, complete as BFS yet needing as little of working space as
DFS. Iterative deepening explores the progressively long `prefix' of
tree with DFS. Each new exploration phase repeats all the work of the
previous explorations of shallower prefixes. Iterative deepening
clearly trades time for space. Despite its gross wastefulness, the
method is quite popular, for example, in automated theorem proving.
Its trade-off has proved worthwhile.

Here are the results of computing and printing the first n
Pythagorean triples. The code was compiled by GHC 7.0.4 with optimization -O2.

Mutator time, sec

CG time, sec

Memory in use, MB

Average residency, KB

BFS, n=30

13.0

5.0

3

465

Iter Deep, n=30

0.15

0.06

5

1506

Iter Deep, n=100

4.8

1.3

56

20832

We expected iterative deepening to take more time than BFS but
significantly less memory. We observed just the opposite.

Recall that iterative deepening keeps re-traversing the tree. Each
exploration cycle redoes all the previous explorations. Lazy
evaluation helps, it seems. When we first reach Node e1 e2,
we evaluate e1 and e2 that were
stored in the node unevaluated (otherwise, we would have diverged
constructing the tree, which is infinite). Lazy evaluation
replaces e1 and e2 with their results. When
iterative deepening comes across the same node in a new cycle, it gets
the results of e1 and e2 right away. That
seems like a good thing, until we look at the space. As iterative
deepening explores the Tree, it needs more and more memory to
store the explored prefix, which is about twice the size of the BFS
frontier. Lazy evaluation thus defeats the purpose of iterative
deepening, of recomputing the revisited tree nodes to avoid
storing them. Lazy evaluation does exactly the wrong thing.

In a strict language, we would have used thunks to represent infinite trees.
If tree nodes store thunks, lazy evaluation would memoize thunks --
which evaluate to themselves rather than to trees. It seems therefore the
following modification should stop lazy evaluation's meddling in
iterative deepening.

Every time we need traverse through a Node, we have to
force the thunks and re-compute the branches. At least, in theory.
Here is the practice.

Mutator time, sec

CG time, sec

Memory in use, MB

Average residency, KB

BFS, n=30

13.0

5.0

3

509

Iter Deep, n=30

0.3

0.1

8

2964

Iter Deep, n=100

10.6

1.7

96

39244

The fix that was supposed to help iterative deepening has made it worse.

Such an unexpected result was quite a puzzle. It seems GHC is
just too smart. Apparently it notices that a thunk (\() -> e) can only be applied to the same argument.
Therefore, the first time the thunk is forced by applying it
to (), the result can justifiably be memoized: the next
time around the thunk will be applied to the same (), and
hence, will give the same result anyway.

The new fix is to deliberately confuse GHC. We obfuscate the
tree-construction operations (>>=) and mplus
with auxiliary functions app and app1.

Finally we see the agreement with our expectations. Iterative
deepening now needs only 2MB (down from 56MB in the first try) to compute
the first 100 Pythagorean triples. We may now try bigger searches.

We have seen that lazy evaluation is a trade-off, which may be
hurtful in some cases, in particular, in search problems over huge
data structures, where it is often beneficial to recompute the
result than to store it. Preventing lazy evaluation
is possible but surprisingly tricky.

Breaking referential transparency with unsafeInterleaveST

Among unsafe functions, unsafeInterleaveST (and its close relative,
or specialization, unsafeInterleaveIO) is often viewed as `mostly
harmless'. The function unsafeInterleaveIO underlies Lazy IO, an
(unfortunately) widely used feature. The interleaving functions
bear hardly any stigma. On the contrary: Hackage has the monad-interleave package dedicated to them, encouraging their use.

It is unfortunate that the admonition of the people who introduced interleaveST has gone unheeded. In their 1995 paper ``State in Haskell''
Launchbury and Peyton Jones wrote:
``It should be clear by now that interleaveST has very undesirable
properties. It duplicates and discards the state, which gives rise to
a very subtle class of programming errors. We have so far failed to
develop good techniques for reasoning about its correctness''
[Sec 10.5]. Their fears are justified: unsafeInterleaveSTis unsafe.
It lets us write Bool expressions with observable mutable-state
side-effects, hence proving that True equals False.
In other words, referential transparency, or the substitution
of equals for equals, may fail even for Bool expressions.

To see how bad this is, recall the claims about ST from Launchbury and
Peyton Jones' paper, excerpted from Section 1 below:

``Complete referential transparency is maintained...
But in addition, we guarantee that the state is used in a
single-threaded way. Consequently, the final state can be constructed
by modifying the input state in-place. This efficient implementation
respects the purely-functional semantics of the state-transformer
function, so all the usual techniques for reasoning about functional
programs continue to work. Similarly, stateful programs can be exposed
to the full range of program transformations applied by a compiler,
with no special cases or side conditions.''

``It is possible to encapsulate stateful computations so that they
appear to the rest of the program as pure (stateless) functions which
are guaranteed by the type system to have no interactions whatever
with other computations, whether stateful or otherwise (except via the
values of arguments and results, of course).''

``Stateful computations can even be performed lazily without losing safety.''

The function unsafeInterleaveST breaks each of these claims. It is unsafe.
(The authors define safety as the Church-Rosser property:
the program giving the same result regardless of the evaluation order provided
data dependencies are respected; see the beginning of Sec. 10.)
Launchbury and
Peyton Jones knew that very well. In the hindsight, the bad_ctx example
has the same kernel of badness as the unique-supply tree example
in Sec 10.2 of their paper. The authors
point out a subtlety in that code, admitting that they fell into the
trap themselves. The fact that unsafeInterleaveST is unsafe and dangerous,
and also apparently useful lead the authors to pessimistically
conclude Sec 10.5:
``We fear that there may be no absolutely secure system -- that is, one
which guarantees the Church-Rosser property -- which is also
expressive enough to describe the programs which systems programmers
(at least) want to write, such as those above. We do, however, regard interleaveST as useful primarily for systems programmers.''

The function unsafeInterleaveST ought to bear the same stigma as
does unsafePerformIO. After all, both masquerade side-effecting
computations as pure. Both break the equational reasoning, the
greatest asset of Haskell.

Version

The current version is April 2013.

References

John Launchbury and Simon L. Peyton Jones: State in Haskell
Lisp and Symbolic Computation, 1995, v8 N4, pp. 293-341.
This paper has introduced the ST monad and also lazy IO.

Observing strictness

A strictness test takes a function and returns True if the function
is strict and False otherwise. The article shows several implementations
of the test. After a brief reminder of strictness, we explain
why the strictness test is impossible to implement in Haskell.
Nonetheless several implementations exist -- which should tell us about
the features of GHC and Haskell to be leery of.

In a non-strict language, a function generally receives as its
argument an unevaluated computation rather than a value. If a function
can produce the result value without executing the argument
computation -- without really needing the value of the argument -- the
function is called non-strict. It is strict otherwise. For example,
the function const 1 :: Int -> Int is non-strict (since it returns 1
without looking at its argument), and succ :: Int -> Int is strict
since it needs the value to increment. The function id :: Int ->
Int is also strict: obtaining the value (the weak head normal form) of the
result requires the value of the argument. A well-known non-strict
function is cons, or (:). For the reason why, see the famous paper
that begat lazy evaluation, on why cons should not evaluate its
arguments.

To tell if a function is strict we should be able to observe the
evaluation of its argument. Functions are opaque and we cannot look
inside them. Hence the only way to tell if the argument has been
evaluated is by observing the side-effect of the evaluation. However,
the type of the argument may preclude side effects. For example, the
argument of an Int->Int function has the pure type Int. Computations of that type cannot have observable side-effects
in a pure language. To put this another way: if x::Int is a
computation whose evaluation is observable and if v is the value of x, then v and x are not substitutable for each other since f x
and f v are observably different if f is strict. Therefore, the
strictness test should not be possible. If it were, it would break the
referential transparency.

There is an escape hatch in Haskell: a non-termination is not considered
a side-effect. Therefore, a computation of the type Int is allowed
not to evaluate to anything and loop forever. Haskell has
a convenient term for such a looping computation: undefined.
Thus we come to the conventional definition of non-strict functions:
a function f is non-strict if f undefined returns a value.

This escape hatch does not contradict the impossibility to observe
strictness: if f undefined takes long time to compute a value, we
cannot be sure that it really loops (and the cause of looping is
really the attempt to evaluate the looping argument). Perhaps f just
takes a very long time to finish. In short, we cannot really observe
divergence -- otherwise, we would solve the Halting problem.

And yet there are at least two ways to write the strictness test in
practice, because GHC lets us create observably side-effecting
computations of pure types, and IO contexts are not
referentially transparent. Lazy IO makes the test possible already in Haskell98.
The computation length str is side-effecting
if str::String is the result of getContents. Evaluating length
str has the side-effect of reading the file and closing the
handle. The status of the handle can be detected. GHC brings imprecise
exceptions, thus permitting `multiple bottoms'. Here is a simple test
of strictness with imprecise exceptions.

Lazy IO breaks equational reasoning

Lazy IO creates an attractive illusion that files are ordinary Haskell
strings, letting the pure Haskell code handle a file as if it were
loaded in memory. Furthermore, just as an ordinary string is built
lazily, so could a file be read as needed. The illusion has a dark
side: the real danger of deadlocks; running out of file descriptors
and similar scarce resources; unpredictable, volatile and sometimes
unbearably excessive use of memory. Lazy IO also has the fundamental
problem: it breaks the most attractive feature of Haskell, equational
reasoning. The problem was alluded to in Launchbury and Peyton Jones'
paper that introduced the ST monad, interleaveST, and
lazy IO. We demonstrate that the problem is as bad as they
feared.

UnsafePerformIO is known to be unsafe, breaking equational
reasoning; unsafeInterleaveIO, which underlies Lazy IO, gets a free
pass because any computation with it has to be embedded in the IO
context in order to be evaluated -- and we can expect anything from
IO. But unsafeInterleaveIO has essentially the same code as unsafeInterleaveST: compare unsafeInterleaveST from GHC/ST.lhs
with unsafeDupableInterleaveIO from GHC/IO.hs keeping in mind that IO and ST have the same representation, as described in GHC/IO.hs. And unsafeInterleaveST is really unsafe -- not just
mildly or somewhat or vaguely unsafe. In breaks equational reasoning,
in pure contexts.

Lazy IO is likewise problematic: On one hand, a simple
equational proof shows that for all Boolean x
and y, x < y is the same as not (y <= x). On the other hand,
we exhibit a context that distinguishes the two expressions: plugging
them into the context gives two programs that print different
results. Equational reasoning is thus unsound.

Equational reasoning is regarded as the greatest advantage of pure
functional languages: we may derive programs or prove their
correctness using the elementary, high-school algebra. Equational
reasoning is based on the principle of
`substitution of equals for equals'. If an expression e1 is equal
to e2 (which we write as e1 === e2) and e1 occurs as part
of a larger expression e (written as e[e1]), then we may replace
that e1 with e2 obtaining the equal expression: e[e1] === e[e2].
The familiar symmetry and transitivity properties of equality can
be easily derived from the substitution principle. Here a few more
examples of substitution: 2+2 === 4 derives (2+2) * 5 === 4 * 5, (\x -> (2 + 2) + x) === (\x -> 4 + x), and print (2+2) === print 4. We may thus substitute within a function
and within an action. The expression to substitute into
may be as big as the entire program. Therefore, we may reason equationally about
program behavior -- which is the reason of existence of equational reasoning
in Computer Science. The whole program in Haskell
has the type of IO () and its behavior is doing an observable action,
for example, printing something. If the equational reasoning is sound,
the programs that are proven equal must behave the same: print the same
results.

As an example of equational reasoning, we prove that for all
Boolean x and y, x < y === not (y <= x), using elementary algebra:

The fact that x < y is equal to not (y <= x) lets us substitute
the former expression with the latter in any program it may
appear. The result should be an equal program. Alas, the enclosed code
shows a program in which such a substitution changes the printed
result. The program

main = counterex_ctx $ \(x,y) -> x < y

prints True whereas

main = counterex_ctx $ \(x,y) -> not (y <= x)

prints False.

The problematic counterex_ctx used lazy IO, which, recall, creates an
illusion that file data is just a String. Pattern-matching on such a
string may however cause IO, to read a chunk of a file. IO
has side-effects, affecting other, side, expressions in
context. Therefore, one may not freely substitute side-effecting
expressions. Although reading a file may appear a benign effect, it
has side-effects still: the file-descriptor and buffer allocation,
locking, advancing the file position, or consuming data from a
pipe. They are all observable. Our expression x < y
appeared pure and had the pure type Bool, and so its substitution looked justified. With lazy IO, even a
pure expression may have an observable side-effect. Equational
reasoning becomes unsound.

Some object by saying that IO is inherently non-deterministic. If
there is a race condition, one should not be surprised if a program
that printed True prints False on the next run. The
non-determinism of IO will hence explain why unsafeInterleaveIO is `fine'
whereas unsafeInterleaveST is truly unsafe: ST is supposed to be
deterministic.

Recall however that IO is just an instance of ST, by
design. Launchbury and Peyton Jones, in the paper ``State in Haskell''
(1995) that introduced ST and Lazy IO, stated as one of the
contributions: ``Input/output takes its place as a specialised form of
stateful computation. Indeed, the type of I/O-performing
computations is an instance of the (more polymorphic) type of stateful
computations. Along with I/O comes the ability to call imperative
procedures written in other languages.''

To expand on this connection, here is the IO version of the earlier ST demonstration that the equality on Booleans is, worryingly,
not symmetric.

This code has no ``real'' IO, no interaction with the external
non-deterministic world. Furthermore, all the IO operations in the code
are the type-specialized versions of the corresponding ST operations, with
essentially the same code. For example, the type IORef is defined in GHC/IORef.hs as

newtype IORef a = IORef (STRef RealWorld a)

It is hard to believe that the supposedly deterministic ST code becomes
non-deterministic when the fully polymorphic state type is
instantiated to RealWorld (recall, types are erased at run-time.)

We have seen that, disturbingly, replacing one pure Bool expression
with another equal expression changes the result printed by
the program. Now that
there are many alternatives to lazy IO for incremental file
processing, it is high time to banish lazy IO from Haskell.

describe.pdf [334K]
The full version of the paper presented at FLOPS 2012. A shorter version is published in FLOPS Proceedings, Springer's LNCS 7294, pp. 166-181.
Section 2 of the paper illustrates many practical problems with Lazy IO.

Catching exceptions in a general MonadIO

Handling exceptions via catch, bracket, catchDyn, etc. in a
MonadIO other than the IO itself has been a fairly frequently
requested feature. The requests tend to recur, probably because these
functions cannot be defined for a general MonadIO. However, we can
define the generic exception handling for a large and interesting
subset of MonadIO, which includes various (repeated) transformations
of IO by ReaderT, WriterT, ErrorT, StateT, and newtype
wrapping.

The generic catch has been successfully used since 2006 in a
database library Takusen, where many operations work in the monad
ReaderT Session IO; the database session data are always
available as the environment. Many other foreign libraries are structured
around sessions, connections, library environments, which can be
encapsulated in a monad. We should nevertheless be able to handle IO errors
and user exceptions that arise in such computations.

Back in 2006 Jesse Tov has done an admirably thorough job of
implementing exception handling in general monads. His code defines
two classes: EMonad and EMonadIO -- which
contain most of the interesting monads. The latter is the subclass of
the former, permitting arbitrary IO via liftIO. In either
case, we use gthrow, gbracket, gcatch, ghandle, gfinally,
etc. -- without even thinking which Monad we are in and how error
handling is actually implemented, via ErrorT or via IO exceptions. It works
universally for most of monads of interest. The experience of
using EMonadIO since 2006 in large production code has
been most positive.

Version

The current version is 1.1, February 2006.

References

CaughtMonadIO.lhs [6K]
The complete literate Haskell code
The code includes two tests, illustrating throwing and
catching of (dynamic) exceptions in monads obtained from
IO by several applications of
ReaderT, WriterT and ErrorT.
The code was originally posted as generic catch in a MonadIO on the Haskell mailing list on Tue Feb 7 22:48:24 EST 2006

That is, we add one simple line to the definition of the
function, without changing the proper body of the function at all. We
prepend an extra clause with the guard that always yields False. Before failing, however, the guard forces the evaluation
of the selected arguments of the function. If needed, seq
can be replaced with deepSeq.

Making (some of the) arguments of a function strict can make
the function run in constant space, and so an iteration (Runge-Kutta
iteration in the above example) can proceed without running out
of (stack) space. The function iterate2 (and a similar
function rk4Next) were applied to arguments like n-1 and y + a1/6 -- which is a red
flag. These are exactly the kinds of expressions that lead to memory
exhaustion. Perhaps it is because the size of an unevaluated thunk for n-1 is so much bigger than the size of the evaluated
thunk. It seems that arithmetic expressions are the best candidates
for some opportunistic evaluation...

Partial signatures
Another application of the trick of adding a clause with an always failing guard

Simple and reliable uni- and bi-directional pipes

The MySysOpen module offers a reliable, proven way of interacting
with another local or remote process via a unidirectional or
bidirectional channel. It supports pipes and Unix and TCP sockets. MySysOpen is a simple and explicit alternative to
the multi-threaded IO processing of the GHC run-time
system. The module is the Haskell binding to sys_open -- the extended, user-level file opening interface.

The second half of MySysOpen.hs contains several
bi-directional channel interaction tests. One checks repeated
sending and receiving of data; the amount of received data is
intentionally large, about 510K. Two other tests interact with
programs that are not specifically written for interactive use,
such as sort. The latter cannot produce any
output before it has read all of the input, accepting no input terminator
other than the EOF condition. One test uses shutdown to set
the EOF. The other test programs the handler
for a custom EOF indicator, literally in the file name of
the communication pipe. The final example tests the handling
of the forceful closure of the communication channel in the middle
of writing.

Newer FastCGI and the memory-efficient IO interface

We present an efficient library for writing CGI and
FastCGI programs. Our goal is the least latency and the least memory
consumption. The library underlies a production web application
server Metcast. The server has been running since February 2007,
sending large amounts of text and binary data in response to a
continuous stream of requests. The server processes a request in constant and small memory imposing little latency, encoding and
sending to the client chunks of data as soon as the database delivers
them. The server handles hundreds of such requests without allocating
memory; in fact, it uses only one 16KB buffer for all of its I/O. With
the exception of an occasional rank-2 type and extended pattern
guards, all of the code is in Haskell98. Not a single
unsafeHaskell function occurs in the code.

The library is originally based on the code written by Jesse
Tov, who in turn used NewCGI by Bjorn Bringert et al. Most of the code
has been re-written. The biggest change is minimizing the amount of
state. The library is centered around generalized input and output
ports: a simple IO interface to read, write and copy via a single,
large, once allocated buffer. The buffer and its dimensions are
hidden, to discourage aliasing. We use this interface for the IO to
and from the client, to and from files or the PostgreSQL database, and for
reading from external servers.

A generalized input port is a procedure to read at most the
specified number of bytes into a given buffer. The procedure should
return the number of bytes actually read, or 0 on EOF. It may throw
various exceptions if reading didn't go well. The procedure is like hGetBuf partially applied to a handle.

One can construct an Input from a Posix file descriptor
or a String. One can combine two Inputs into
one, which reads from the first generalized port until EOF and then
reads from the second. The frequently mentioned EMonadIO
is a class of monads permitting IO along with throwing and
catching of arbitrary errors. The IO monad and most of its transformations
are in that class. EMonadIO lets us write gthrow, gcatch, gbracket, etc.
without even thinking of the current monad.

Dually, a generalized output port is a procedure to write the
specified number of bytes from the given buffer. It may throw various
exceptions if writing didn't go well.

The CGI monad exports (Fast)CGI output and error streams as
Output, and lets us access request content as Input.
We can use the generalized ports for reading and writing
strings. The ports are intended though to be connected via BCopy,
which copies from Input to Output the desired number of bytes or
till EOF.

Version

The current version is February 2007.

References

NewerCGI.hs [29K]
The commented source code of the library for writing CGI and FastCGI programs
The code was first mentioned in a message Takusen and large PostgreSQL blobs [was: Handling custom types in Takusen] on the Haskell-Cafe mailing list on Fri, 27 Jul 2007 20:34:29 -0700 (PDT)

Takusen: a DBMS access library built around a left-fold enumerator

Takusen is a joint project with Alistair Bayley. Takusen
is a DBMS access library. It has a backend for Oracle on Unix, Linux
or Windows via OCI, and a stub to test the library without any
database. The infrastructure and the stub let one interface natively
with other databases.

The distinguished feature of Takusen is processing query
results using a left-fold enumerator. The user supplies an iteratee
function, which receives rows one-at-a-time from the result-set. The
number of the arguments to the iteratee is the number of the columns in the
result-set, plus the seed. Each column in the result-set has
its own Haskell type. The latter could be a Maybe type if the
particular iteratee wishes to process NULLs.

The benefits are: more convenient and intuitive enumeration,
iteration, and accumulation (see tests for examples); the retrieved
data are not merely strings but have native Haskell types: Int, Float, Date, etc.; buffer preallocation; pre-fetching; support for
both enumerators and cursors, proper handling of all errors including
various IO errors. No unsafe operations are used.

How to implement AND without pattern-matching

How to implement the function AND, i.e., (&&), without the explicit or
even the implicit pattern-matching? Keep in mind that the standard
Prelude not and (||) are defined in terms
of pattern-matching. The article below gives several answers.
One of them relies on pointer arithmetic, in Haskell.

Version

The current version is 1.1, Jul 13, 2001.

References

bothTrue.txt [2K]
The article with the source code for three solutions. It was originally posted as Re: bothTrue on the newsgroup comp.lang.functional on Fri, 13 Jul 2001 17:45:22 -0700