Haskell and other call-by-need languages use lazy evaluation as their default evaluation strategy. For beginners and advanced programmers alike this can sometimes be confusing. At Well-Typed a core part of our business is teaching Haskell, which we do through public courses (such as the upcoming Skills Matter courses Fast Track to Haskell, Haskell Performance and Optimization and The Haskell Type System), private in-house courses targeted at specific client needs, and of course through writing blog posts.

In order to help us design these courses, we developed a tool called visualize-cbn. It is a simple interpreter for a mini Haskell-like language which outputs the state of the program at every step in a human readable format. It can also generate a HTML/JavaScript version with “Previous” and “Next” buttons to step through a program. We released the tool as open source to github and Hackage, in the hope that it will be useful to others.

The README.md file in the repo explains how to run the tool. In this blog post we will illustrate how one might take advantage of it. We will revisit the infamous triple of functions foldr, foldl, foldl', and show how they behave. As a slightly more advanced example, we will also study the memory behaviour of mapM in the Maybe monad. Hopefully, this show-rather-than-tell blog post might help some people understand these functions better.

Throughout this section we will use this definition of enumFromTo:

enumFromTo n m =if n <= m then n : enumFromTo (n +1) m
else []

so that, say, [1..3] corresponds to (enumFromTo 1 3).

foldr/foldrl/foldl'

In this section we will examine the difference between these three functions. We will not study these functions directly, however, but study a slightly simpler variant in the form of three definitions of length on lists. For a more in-depth discussion of this triple of functions, see our earlier blog post on this topic.

foldr

Consider the naive definition of length:

length xs =case xs of
[] ->0
(x:xs') ->1+ length xs'

This corresponds to defining length = foldr (\x n -> 1 + n) 0.

Let’s consider what happens when we compute length [1..3]; you can click on Prev and Next to step through the execution:

Then in step 1, length needs to do a case analysis, which forces us apply enumFromTo, and evaluate it until we have a top-level Cons constructor (in step 4)

At that point we can execute the pattern match and the process continues.

When we evaluate enumFromTo (add 1 1) 3 in step 6, note that the expression add 1 1 is only evaluated once, although it is used twice; this sharing of computation is what makes Haskell a true lazy (call-by-need) language (as opposed to a call-by-name language).

In these animations these shared expressions are shown separately below the expression; you can think of this as the “heap”, and accordingly the animation also shows when these expressions are garbage collected (e.g. step 14).

Note as you step through that we have a build up of calls to add which are only resolved until the very end. This is the source of the memory leak in this definition of length (corresponding to foldr).

foldl

In a foldl-style definition of length, we introduce an accumulator:

length acc xs =case xs of
[] -> acc
x:xs' -> length (1+ acc) xs'

This corresponds to defining length = foldl (\n x -> 1 + n) 0.

Unlike the previous definition, this is tail-recursive; however, it still suffers from a memory leak due to Haskell’s extremely lazy nature. You will see why when you step through the execution:

mapM over the Maybe monad

As a final example of a slightly more advanced nature, try predicting what will happen when we run this in ghci:

case mapM return [1..] ofJust (x:_) -> x

If you try it out and the result is not what you expected, perhaps stepping through the following evaluation of mapM return [1..3] to weak-head normal form (whnf: when there is a constructor at the top-level) will help you understand:

Note that this expression reduces to whnf only after the entire list has been evaluated, and moreover that this requires an O(n) nested pattern matches. The take-away point from this example is that mapM should not be applied to long lists in most monads, as this will result in a memory leak.

Conclusion

Laziness can be tricky to understand sometimes, and being able to go through the evaluation of a program step by step can be very helpful. The visualize-cbn tool can be used to generate HTML/JavaScript files that can be used to visualize this evaluation as shown on this blog post; alternatively, it can also write the evaluation trace to the console. The source files (the various definitions of length) can be found in the repo. Feedback and pull requests are of course always welcome :)

]]>Binary instances for GADTs (or: RTTI in Haskell)http://www.well-typed.com/blog/2017/06/rtti
* where
VI :: Int -> Val Int
VD :: Double -> Val Double
```
If you want to play along the full source code for the examples in this
blog post can be found on github.
FAILED [...]]]>edskocodinghttp://www.well-typed.com/blog/2017/06/rttiThu, 15 Jun 2017 09:48:29 GMTIn this blog post we consider the problem of defining Binary instances for GADTs such as

However, this does not work. The definition of put is type correct (but dubious), but the definition of get is not type correct. And actually this makes sense: we are claiming that we can define Binary (Val a)for anya; but if the tag is 0, then that a can only be Int, and if the tag is 1, then that a can only be Double.

One option is to instead give a Binary (Some Val) instance with Some defined as

dataSome :: (*->*) ->*whereExists :: forall f x. f x ->Some f

That is often independently useful, but is a different goal: in such a case we are discovering type information when we deserialize. That’s not what we’re trying to achieve in this blog post; we want to write a Binary instance that can be used when we know from the context what the type must be.

Working, but inconvenient

The next thing we might try is to introduce Binary instances for the specific instantiations of that a type variable:

Note that there is no need to worry about any tags in the encoded bytestring; we always know the type. Although this works, it’s not very convenient; for example, we cannot define

encodeVal ::Val a ->ByteString
encodeVal = encode

because we don’t have a polymorphic instance Binary (Val a). Instead we’d have to define

encodeVal ::Binary (Val a) =>Val a ->ByteString
encodeVal = encode

but that’s annoying: we know that that a can only be Int or Double, and we have Binary instances for both of those cases. Can’t we do better?

Introducing RTTI

Although we know that a can only be Int or Double, we cannot take advantage of this information in the code. Haskell types are erased at compile time, and hence we cannot do any kind of pattern matching on them. The key to solving this problem then is to introduce some explicit runtime type information (RTTI).

We start by introducing a data family associating with each indexed datatype a corresponding datatype with RTTI:

data family RTTI (f :: k ->*) :: (k ->*)

For the example Val this runtime type information tells us whether we’re dealing with Int or Double:

We’re now almost done: the last thing we need to express is that if we know at the type level that we have some RTTI available, then we can serialize. For this purpose we introduce a type class that returns the RTTI:

and the good news is that this means that whenever we construct specific Vals we never have to construct the RTTI by hand; ghc’s type class resolution takes care of it for us.

Taking stock

Instead of writing

encodeVal ::Binary (Val a) =>Val a ->ByteString
encodeVal = encode

we can now write

encodeVal ::HasRTTIVal a =>Val a ->ByteString
encodeVal = encode

While it may seem we haven’t gained very much, HasRTTI is a much more fine-grained constraint than Binary; from HasRTTI we can derive Binary constraints, like we have done here, but also other constraints that rely on RTTI. So while we do still have to carry these RTTI constraints around, those are – ideally – the only constraints that we still need to carry around. Moreover, as we shall see a little bit further down, RTTI also scales nicely to composite type-level structures such as type-level lists.

Another example: heterogeneous lists

As a second—slightly more involved—example, lets consider heterogeneous lists or n-ary products:

As was the case for Val, we always statically know how long such a list is, so there should be no need to include any kind of length information in the encoded bytestring. Again, for serialization we don’t need to do anything very special:

The only minor complication here is that we need Binary instances for all the elements of the list; we guarantee this using the All type family (which is a minor generalization of the All type family explained in the same set of lecture notes linked above):

Serializing lists of Vals

This All Binary Val xs constraint however is unfortunate, because we know that all Vals can be deserialized! Fortunately, we can do better. The RTTI for the (:*) case (RttiNpCons) included RTTI for the elements of the list. We made no use of that above, but we can make use of that when giving a specialized instance for lists of Vals:

Note that this use of overlapping type classes instances is perfectly safe: the overlapping instance is fully compatible with the overlapped instance, so it doesn’t make a difference which one gets picked. The overlapped instance just allows us to be more economical with our constraints.

Here we can appreciate the choice of RTTI being a data family indexed by f; indeed the constraint HasRTTI f x in RttiNpCons is generic as possible. Concretely, decodeVals required only a single HasRTTI constraint, as promised above. It is this compositionality, along with the fact that we can derive many type classes from just having RTTI around, that gives this approach its strength.

Advanced example

To show how all this might work in a more advanced example, consider the following EDSL describing simple functions:

If you are new to EDSLs (embedded languages) in Haskell, you way wish to watch the Well-Typed talk Haskell for embedded domain-specific languages. However, hopefully the intent behind Fn is not too difficult to see: we have a datatype that describes functions: exponentiation, square root, integer modules, rounding, and function composition. The two type indices of Fn describe the function input and output types. A simple interpreter for Fn would be

In the remainder of this blog post we will consider how we can define a Binary instance for Fn. Compared to the previous examples, Fn poses two new challenges:

The type index does not uniquely determine which constructor is used; if the type is (Double, Double) then it could be Exp, Sqrt or indeed the composition of some functions.

Trickier still, Comp actually introduces an existential type: the type “in the middle” b. This means that when we serialize and deserialize we do need to include some type information in the encoded bytestring.

For our DSL of functions, we only have functions from Double to Double, from Int to Int, and from Double to Int (and this is closed under composition).

Serializing type information

The next question is: when we serialize a Comp constructor, how much information do we need to serialize about that existential type? To bring this into focus, let’s consider the type information we have when we are dealing with composition:

Whenever we are deserializing a Fn, if that Fn happens to be the composition of two other functions we know RTTI about the composition; but since the “type in the middle” is unknown, we have no information about that at all. So what do we need to store? Let’s start with serialization:

putRttiComp ::RTTIFn'(a,c) -> RttiComp '(a,c) ->Put

The first argument here is the RTTI about the composition as a whole, and sets the context. We can look at that context to determine what we need to output:

Let’s take a look at what’s going on here. When we know from the context that the composition has type Double -> Double, then we know that the types of both functions in the composition must also be Double -> Double, and hence we don’t need to output any type information. The same goes when the composition has type Int -> Int, although we need to work a bit harder to convince ghc in this case. However, when the composition has type Double -> Int then the first function might be Double -> Int and the second might be Int -> Int, or the first function might be Double -> Double and the second might be Double -> Int. Thus, we need to distinguish between these two cases (in principle a single bit would suffice).

Having gone through this thought process, deserialization is now easy: remember that we know the context (the RTTI for the composition):

Binary instance for Fn

The hard work is now mostly done. Although it is probably not essential, during serialization we can clarify the code by looking at the RTTI context to know which possibilities we need to consider at each type index. For example, if we are serializing a function of type Double -> Double, there are three possibilities (Exp, Sqrt, Comp). We did something similar in the previous section.

Deserialization proceeds along very similar lines; the only difficulty is that when we deserialize RTTI using getRttiComp we somehow need to reflect that to the type level; for this purpose we can provide a function

reflectRTTI ::RTTI f a -> (HasRTTI f a => b) -> b

It’s definition is beyond the scope of this blog post; refer to the source code on github instead. With this function in hand however deserialization is no longer difficult:

If desired, a specialized instance for HList Fn can be defined that relies only on RTTI, just like we did for Val (left as exercise for the reader).

Conclusion

Giving type class instances for GADTs, in particular for type classes that produce values of these GADTs (deserialization, translation from Java values, etc.) can be tricky. If not kept in check, this can result in a code base with a lot of unnecessarily complicated function signatures or frequent use of explicit computation of evidence of type class instances. By using run-time type information we can avoid this, keeping the code clean and allowing programmers to focus at the problems at hand rather than worry about type classes instances.

PS: Singletons

RTTI looks a lot like singletons, and indeed things can be set up in such a way that singletons would do the job. The key here is to define a new kind for the type indices; for example, instead of

In such a setup singletons can be used as RTTI. Which approach is preferable depends on questions such as are singletons already in use in the project, how much of their infrastructure can be reused, etc. A downside of using singletons rather than a more direct encoding using RTTI as I’ve presented it in this blog post is that using singletons probably means that some kind of type level decoding needs to be introduced (in this example, a type family U -> *); on the other side, having specific kinds for specific purposes may also clarify the code. Either way the main ideas are the same.

]]>Haskell development jobs with Well-Typedhttp://www.well-typed.com/blog/2017/05/haskell-development-jobs-with-well-typed
andres, duncan, adamwell-typedhttp://www.well-typed.com/blog/2017/05/haskell-development-jobs-with-well-typedThu, 11 May 2017 13:56:25 GMTtl;drIf you’d like a job with us, send your application as soon as possible.

We are looking for several (probably two) Haskell experts to join our team at Well-Typed. This is a great opportunity for someone who is passionate about Haskell and who is keen to improve and promote Haskell in a professional context.

About Well-Typed

We are a team of top notch Haskell experts. Founded in 2008, we were the first company dedicated to promoting the mainstream commercial use of Haskell. To achieve this aim, we help companies that are using or moving to Haskell by providing a range of services including consulting, development, training, and support and improvement of the Haskell development tools. We work with a wide range of clients, from tiny startups to well-known multinationals. We have established a track record of technical excellence and satisfied customers.

Our company has a strong engineering culture. All our managers and decision makers are themselves Haskell developers. Most of us have an academic background and we are not afraid to apply proper computer science to customers’ problems, particularly the fruits of FP and PL research.

We are a self-funded company so we are not beholden to external investors and can concentrate on the interests of our clients, our staff and the Haskell community.

About the jobs

Generally, the roles are not tied to a single specific project or task, and allow remote work. However, we are also looking for someone to work on a specific project with one of our clients, and that requires work on-site in London.

Please indicate in your application whether on-site work in London is an option for you.

In general, work for Well-Typed could cover any of the projects and activities that we are involved in as a company. The work may involve:

working on GHC, libraries and tools;

Haskell application development;

working directly with clients to solve their problems;

teaching Haskell and developing training materials.

We try wherever possible to arrange tasks within our team to suit peoples’ preferences and to rotate to provide variety and interest.

Well-Typed has a variety of clients. For some we do proprietary Haskell development and consulting. For others, much of the work involves open-source development and cooperating with the rest of the Haskell community: the commercial, open-source and academic users.

Our ideal candidate has excellent knowledge of Haskell, whether from industry, academia or personal interest. Familiarity with other languages, low-level programming and good software engineering practices are also useful. Good organisation and ability to manage your own time and reliably meet deadlines is important. You should also have good communication skills.

You are likely to have a bachelor’s degree or higher in computer science or a related field, although this isn’t a requirement.

Further (optional) bonus skills:

experience in teaching Haskell or other technical topics,

experience of consulting or running a business,

knowledge of and experience in applying formal methods,

familiarity with (E)DSL design,

knowledge of concurrency and/or systems programming,

experience with working on GHC,

experience with web programming (in particular front-end),

… (you tell us!)

Offer details

The offer is initially for one year full time, with the intention of a long term arrangement. For the remote role(s), living in England is not required. For the on-site role, you have to be allowed to work in England. We may be able to offer either employment or sub-contracting, depending on the jurisdiction in which you live.

If you are interested, please apply via info@well-typed.com. Tell us why you are interested and why you would be a good fit for Well-Typed, and attach your CV. Please indicate whether the on-site work in London is an option for you. Please also indicate how soon you might be able to start.

One-day course covering most of GHC’s extensions of the Haskell type system, such as GADTs, data kinds, and type families. Suitable for Haskellers who want to get most out of Haskell’s powerful type system and understand the trade-offs involved.

Two-day course focusing on the internals of GHC, the evaluation strategy, how programs are compiled and executed at run-time. Explains how to choose the right data structure for your program in a lazy functional language, what kind of optimizations you can expect the compiler to perform, and how to write beautiful programs that scale.

Each of these courses is open for registration, with reduced rates available if you book soon.

The courses will also be held again in October 2017, in connection with the Haskell eXchange.

We also provide on-site (and remote) courses tailored to your specific needs. If you want to know more, have a look at our training page or contact us.

Other upcoming events

The following are some other events in 2017 we are planning to participate in (we may be at other events, too):

As in previous years, we’ll be at ZuriHac again, which is the largest European Haskell Hackathon. Whether you’re a newcomer who wants to try Haskell for the first time or an experienced Haskeller with many years of experience, we are looking forward to meeting you there.

The annual International Conference on Functional Programming will take place in Oxford this year. A full week of events focused on functional programming, including the two-day Haskell Symposium and the Haskell Implementors Workshop. There’s also the Commercial Users of Functional Programming conference which features several tutorials on various programming languages and techniques.

The two-day Haskell developer conference organized by us and Skills Matter in London is back for another year. We are currently looking for talk proposals for this conference, so if you have anything you would like to present, please submit! Registration is also open already, and tickets are cheaper the earlier you book.

There’s also going to be a two-day hackathon / unconference on the weekend after the Haskell eXchange.

If you would be interested in sponsoring the Haskell eXchange, please let us know.

]]>Hackage reliability via mirroringhttp://www.well-typed.com/blog/2016/09/hackage-reliability-via-mirroring
duncancommunityhttp://www.well-typed.com/blog/2016/09/hackage-reliability-via-mirroringFri, 30 Sep 2016 15:08:18 GMTTL;DR: Hackage now has multiple secure mirrors which can be used fully automatically by clients such as cabal.

In the last several years, as a community, we’ve come to greatly rely on services like Hackage and Stackage being available 24/7. There is always enormous frustration when either of these services goes down.

I think as a community we’ve also been raising our expectations. We’re all used to services like Google which appear to be completely reliable. Of course these are developed and operated by huge teams of professionals, whereas our community services are developed, maintained and operated by comparatively tiny teams on shoestring budgets.

A path to greater reliability

Nevertheless, reliability is important to us all, and so there has been a fair bit of effort put in over the last few years to improve reliability. I’ll talk primarily about Hackage since that is what I am familiar with.

Firstly, a couple years ago Hackage and haskell.org were moved from super-cheap VM hosting (where our machines tended to go down several times a year) to actually rather good quality hosting provided by Rackspace. Thanks to Rackspace for donating that, and the haskell.org infrastructure team for getting that organised and implemented. That in itself has made a huge difference: we’ve had far fewer incidents of downtime since then.

Obviously even with good quality hosting we’re still only one step away from unscheduled downtime, because the architecture is too centralised.

There were two approaches that people proposed. One was classic mirroring: spread things out over multiple mirrors for redundancy. The other proposal was to adjust the Hackage architecture somewhat so that while the main active Hackage server runs on some host, the the core Hackage archive would be placed on an ultra-reliable 3rd party service like AWS S3, so that this would stay available even if the main server was unavailable.

The approach we decided to take was the classic mirroring one. In some ways this is the harder path, but I think ultimately it gives the best results. This approach also tied in with the new security architecture (The Update Framework – TUF) that we were implementing. The TUF design includes mirrors and works in such a way that mirrors do not need to be trusted. If we (or rather end users) do not have to trust the operators of all the mirrors then this makes a mirroring approach much more secure and much easier to deploy.

Where we are today

The new system has been in beta for some time and we’re just short of flipping the switch for end users. The new Hackage security system in place on the server side, while on the client side, the latest release of cabal-install can be configured to use it, and the development version uses it by default.

There is lots to say about the security system, but that has (1, 2, 3) and will be covered elsewhere. This post is about mirroring.

For mirrors, we currently have two official public mirrors, and a third in the works. One mirror is operated by FP Complete and the other by Herbert Valerio Riedel. For now, Herbert and I manage the list of mirrors and we will be accepting contributions of further public mirrors. It is also possible to run private mirrors.

Once you are using a release of cabal-install that uses the new system then no further configuration is required to make use of the mirrors (or indeed the security). The list of public mirrors is published by the Hackage server (along with the security metadata) and cabal-install (and other clients using hackage-security) will automatically make use of them.

Reliability in the new system

Both of the initial mirrors are individually using rather reliable hosting. One is on AWS S3 and one on DreamHost S3. Indeed the weak point in the system is no longer the hosting. It is other factors like reliability of the hosts running the agents that do the mirroring, and the ever present possibility of human error.

The fact that the mirrors are hosted and operated independently is the key to improved reliability. We want to reduce the correlation of failures.

Failures in hosting can be mitigated by using multiple providers. Even AWS S3 goes down occasionally. Failures in the machines driving the mirroring are mitigated by using a normal decentralised pull design (rather than pushing out from the centre) and hosting the mirroring agents separately. Failures due to misconfiguration and other human errors are mitigated by having different mirrors operated independently by different people.

So all these failures can and will happen, but if they are not correlated and we have enough mirrors then the system overall can be quite reliable.

There is of course still the possibility that the upstream server goes down. It is annoying not to be able to upload new packages, but it is far more important that people be able to download packages. The mirrors mean there should be no interruption in the download service, and it gives the upstream server operators the breathing space to fix things.

TL;DR: Sharing conduit values leads to space leaks. Make sure that conduits are completely reconstructed on every call to runConduit; this implies we have to be careful not to create any (potentially large) conduit CAFs (skip to the final section “Avoiding space leaks” for some details on how to do this). Similar considerations apply to other streaming libraries and indeed any Haskell code that uses lazy data structures to drive computation.

Motivation

We use large lazy data structures in Haskell all the time to drive our programs. For example, consider

main1 ::IO ()
main1 = forM_ [1..5] $ \_ -> mapM_ print [1..1000000]

It’s quite remarkable that this works and that this program runs in constant memory. But this stands on a delicate cusp. Consider the following minor variation on the above code:

This program runs, but unlike main1, it has a maximum residency of 27 MB; in other words, this program suffers from a space leak. As it turns out, main1 was running in constant memory because the optimizer was able to eliminate the list altogether (due to the fold/buildrewrite rule), but it is unable to do so in main2.

But why is main2 leaking? In fact, we can recover constant space behaviour by recompiling the code with -fno-full-laziness. The full laziness transformation is effectively turning main2 into

The first iteration of the forM_ loop constructs the list, which is then retained to be used by the next iterations. Hence, the large list is retained for the duration of the program, which is the beforementioned space leak.

The full laziness optimization is taking away our ability to control when data structures are not shared. That ability is crucial when we have actions driven by large lazy data structures. One particularly important example of such lazy structures that drive computation are conduits or pipes. For example, consider the following conduit code:

The important point to notice about this exception handler is that it retains a reference to the action io as it executes that action, since it might potentially have to execute it again if an exception is thrown. However, all the space leaks we discuss in this blog post arise even when an exception is never thrown and hence the action is run only once; simply maintaining a reference to the action until the end of the program is enough to cause the space leak.

we again end up with a large space leak, this time of type Pipe and ->Pipe (conduit’s internal type):

Although the values that stream through the conduit come from IO, the conduit itself is fully constructed and retained in memory. In this blog post we examine what exactly is being retained here, and why. We will finish with some suggestions on how to avoid such space-leaks, although sadly there is no easy answer. Note that these problems are not specific to the conduit library, but apply equally to all other similar libraries.

We will not assume any knowledge of conduit but start from first principles; however, if you have never used any of these libraries before this blog post is probably not the best starting point; you might for example first want to watch my presentation Lazy I/O and Alternatives in Haskell.

Lists

Before we look at the more complicated case, let’s first consider another program using just lists:

main ::IO ()
main = retry $ ni_mapM_ print [1..1000000]

This program suffers from a space leak for similar reasons to the example with lists we saw in the introduction, but it’s worth spelling out the details here: where exactly is the list being maintained?

Recall that the IO monad is effectively a state monad over a token RealWorld state (if that doesn’t make any sense to you, you might want to read ezyang’s article Unraveling the mystery of the IO monad first). Hence, ni_mapM_ (just a wrapper around mapM_) is really a function of three arguments: the action to execute for every element of the list, the list itself, and the world token. That means that

ni_mapM_ print [1..1000000]

is a partial application, and hence we are constructing a PAP object. Such a PAP object is an runtime representation of a partial application of a function; it records the function we want to execute (ni_mapM_), as well as the arguments we have already provided. It is this PAP object that we give to retry, and which retry retains until the action completes because it might need it in the exception handler. The long list in turn is being retained because there is a reference from the PAP object to the list (as one of the arguments that we provided).

Full laziness does not make a difference in this example; whether or not that [1 .. 10000000] expression gets floated out makes no difference.

Reminder: Conduits/Pipes

Just to make sure we don’t get lost in the details, let’s define a simple conduit-like or pipe-like data structure:

The argument to Await is passed an Either; we give it a Left value if upstream terminated, or a Right value if upstream yielded a value.1

This definition is not quite the same as the one used in real streaming libraries and ignores various difficulties (in particular exception safely, as well as other features such as leftovers); however, it will suffice for the sake of this blog post. We will use the terms “conduit” and “pipe” interchangeably in the remainder of this article.

Sources

The various Pipe constructors differ in their memory behaviour and the kinds of space leaks that they can create. We therefore consider them one by one. We will start with sources, because their memory behaviour is relatively straightforward.

A source is a pipe that only ever yields values downstream.2 For example, here is a source that yields the values [n, n-1 .. 1]:

we get a space leak. This space leak is very similar to the space leak we discussed in section Lists above, with Done () playing the role of the empty list and Yield playing the role of (:). As in the list example, this program has a space leak independent of full laziness.

Sinks

A sink is a conduit that only ever awaits values from upstream; it never yields anything downstream.2 The memory behaviour of sinks is considerably more subtle than the memory behaviour of sources and we will examine it in detail. As a reminder, the constructor for Await is

dataPipe i o m r =Await (Either r i ->Pipe i o m r) |...

As an example of a sink, consider this pipe that counts the number of characters it receives:

If we run this as follows and compile with optimizations enabled, we once again end up with a space leak:

main ::IO ()
main = retry $ feed 'A' (countChars 0)

We can recover constant space behaviour by disabling full laziness; however, the effect of full laziness on this example is a lot more subtle than the example we described in the introduction.

Full laziness

Let’s take a brief moment to describe what full laziness is, exactly. Full laziness is one of the optimizations that ghc applies by default when optimizations are enabled; it is described in the paper “Let-floating: moving bindings to give faster programs”. The idea is simple; if we have something like

This potentially avoids unnecessarily recomputing e for different values of y. Full laziness is a useful transformation; for example, it turns something like

f x y =..where
go =..-- some local function

into

f x y =..
f_go ..=..

which avoids allocating a function closure every time f is called. It is also quite a notorious optimization, because it can create unexpected CAFs (constant applicative forms; top-level definitions of values); for example, if you write

you might expect nthPrime to recompute allPrimes every time it is invoked; but full laziness might move that allPrimes definition to the top-level, resulting in a large space leak (the full list of primes would be retained for the lifetime of the program). This goes back to the point we made in the introduction: full laziness is taking away our ability to control when values are not shared.

Full laziness versus sinks

Back to the sink example. What exactly is full laziness doing here? Is it constructing a CAF we weren’t expecting? Actually, no; it’s more subtle than that. Our definition of countChars was

Note how the computation of countChars' $! cnt + 1 has been floated over the lambda; ghc can do that, since this expression does not depend on mi. So in memory the countChars 0 expression from our main function (retained, if you recall, because of the surrounding retry wrapper), develops something like this. It starts of as a simple thunk:

Then when feed matches on it, it gets reduced to weak head normal form, exposing the top-most Await constructor:

The body of the await is a function closure pointing to the function inside countChars (\mi -> case mi ..), which has countChars $! (cnt + 1) as an unevaluated thunk in its environment. Evaluating it one step further yields

So where for a source the data structure in memory was a straightforward “list” consisting of Yield nodes, for a sink the situation is more subtle: we build up a chain of Await constructors, each of which points to a function closure which in its environment has a reference to the next Await constructor. This wouldn’t matter of course if the garbage collector could clean up after us; but if the conduit itself is shared, then this results in a space leak.

Without full laziness, incidentally, evaluating countChars 0 yields

and the chain stops there; the only thing in the function closure now is cnt. Since we don’t allocate the next Yield constructor before running the function, we never construct a chain of Yield constructors and hence we have no space leak.

Depending on values

It is tempting to think that if the conduit varies its behaviour depending on the values it receives from upstream the same chain of Await constructors cannot be constructed and we avoid a space leak. For example, consider this variation on countChars which only counts spaces:

If we substitute this conduit for countChars in the previous program, do we fare any better? Alas, the memory behaviour of this conduit, when shared, is in fact far, far worse.

The reason is that both the countSpaces $! cnt + 1and the expression countSpaces $! cnt can both be floated out by the full laziness optimization. Hence, now every Await constructor will have a function closure in its payload with two thunks, one for each alternative way to execute the conduit. What’s more, both of these thunks will are retained as long as we retain a reference to the top-level conduit.

The first feed ' ' explores a path through the conduit where every character is a space; so this constructs (and retains) one long chain of Await constructors. The next two calls to feed ' ' however walk over the exact same path, and hence memory usage does not increase for a while. But then we explore a different path, in which every character is a non-space, and hence memory behaviour will go up again. Then during the second call to feed 'A' memory usage is stable again, until we start executing the last feed 'A', at which point the garbage collector can finally start cleaning things up:

What’s worse, there is an infinite number of paths through this conduit. Every different combination of space and non-space characters will explore a different path, leading to combinatorial explosion and terrifying memory usage.

Effects

The precise situation for effects depends on the underlying monad, but let’s explore one common case: IO. As we will see, for the case of IO the memory behaviour of Effect is actually similar to the memory behaviour of Await. Recall that the Effect constructor is defined as

In order to understand the memory behaviour of Effect, we need to understand how the underlying monad behaves. For the case of IO, IO actions are state transformers over a token RealWorld state. This means that the Effect constructor actually looks rather similar to the Await constructor. Both have a function as payload; Await a function that receives an upstream value, and Effect a function that receives a RealWorld token. To illustrate what printFrom might look like with full laziness, we can rewrite it as

If we visualize the heap (using ghc-vis), we can see that it does indeed look very similar to the picture for Await:

Increasing sharing

If we cannot guarantee that our conduits are not shared, then perhaps we should try to increase sharing instead. If we can avoid allocating these chains of pipes, but instead have pipes refer back to themselves, perhaps we can avoid these space leaks.

In theory, this is possible. For example, when using the conduit library, we could try to take advantage of monad transformers and rewrite our feed source and our count sink as:

In both definitions go refers back to itself directly, with no arguments; hence, it ought to be self-referential, without any long chain of sources or sinks ever being constructed. This works; the following program runs in constant space:

main ::IO ()
main = retry $ print =<< (feed $$ count)

However, this kind of code is extremely brittle. For example, consider the following minor variation on count:

In the conduit library, ConduitM is a codensity transformation of an internal Pipe datatype; the latter corresponds more or less to the Pipe datastructure we’ve been describing here. But we can ignore these details: the important point here is that this has the same typical shape that we’ve been studying above, with an allocation inside a lambda but before an await.

Ironically, it would seem that full laziness here could have helped us by floating out that modify' (+1) >> go expression for us. The reason that it didn’t is probably related to the exact way the k continuation is threaded through in the compiled code (I simplified a bit above). Whatever the reason, tracking down problems like these is difficult and incredibly time consuming; I’ve spent many many hours studying the output of -ddump-simpl and comparing before and after pictures. Not a particularly productive way to spend my time, and this kind of low-level thinking is not what I want to do when writing application level Haskell code!

Composed pipes

Normally we construct pipes by composing components together. Composition of pipes can be defined as

The downstream pipe “is in charge”; the upstream pipe only plays a role when downstream awaits. This mirrors Haskell’s lazy “demand-driven” evaluation model.

Typically we only run self-contained pipes that don’t have any Awaits or Yields left (after composition), so we are only left with Effects. The good news is that if the pipe components don’t consist of long chains, then their composition won’t either; at every Effect point we wait for either upstream or downstream to complete its effect; only once that is done do we receive the next part of the pipeline and hence no chains can be constructed.

On the other hand, of course composition doesn’t get rid of these space leaks either. As an example, we can define a pipe equivalent to the getConduit from the introduction

This program suffers from the same space leaks as before because the individual pipelines component are kept in memory. As in the sink example, memory behaviour would be much worse still if there was different paths through the conduit network.

Summary

At Well-Typed we’ve been developing an application for a client to do streaming data processing. We’ve been using the conduit library to do this, with great success. However, occassionally space leaks arise that difficult to fix, and even harder to track down; of course, we’re not the first to suffer from these problems; for example, see ghc ticket #9520 or issue #6 for the streaming library (a library similar to conduit).

In this blog post we described how such space leaks arise. Similar space leaks can arise with any kind of code that uses large lazy data structures to drive computation, including other streaming libraries such as pipes or streaming, but the problem is not restricted to streaming libraries.

The conduit library tries to avoid these intermediate data structures by means of fusion rules; naturally, when this is successful the problem is avoided. We can increase the likelihood of this happening by using combinators such as folds etc., but in general the intermediate pipe data structures are difficult to avoid.

The core of the problem is that in the presence of the full laziness optimization we have no control over when values are not shared. While it is possible in theory to write code in such a way that the lazy data structures are self-referential and hence keeping them in memory does not cause a space leak, in practice the resulting code is too brittle and writing code like this is just too difficult. Just to provide one more example, in our application we had some code that looked like this:

This was true even when that additional clause was never used; it had nothing to do with the change in the runtime behaviour of the code. Instead, when we added the additional clause some limit got exceeded in ghc’s bowels and suddenly something got allocated that wasn’t getting allocated before.

Full laziness can be disabled using -fno-full-laziness, but sadly this throws out the baby with the bathwater. In many cases, full laziness is a useful optimization. In particular, there is probably never any point allocation a thunk for something that is entirely static. We saw one such example above; it’s unexpected that when we write

go = withValue $ \_ -> modify' (+1) >> go

we get memory allocations corresponding to the modify' (+1) >> go expression.

Avoiding space leaks

So how do we avoid these space leaks? The key idea is pretty simple: we have to make sure the conduit is fully reconstructed on every call to runConduit. Conduit code typically looks like

You should put all top-level calls to runConduit into a module of their own, and disable full laziness in that module by declaring

{-# OPTIONS_GHC -fno-full-laziness #-}

at the top of the file. This means the computation of the conduit (stage1 =$= stage2 .. =$= stageN) won’t get floated to the top and the conduit will be recomputed on every invocation of runMyConduit (note that this relies on runMyConduit to have some arguments; if it doesn’t, you should add a dummy one).

This might not be enough, however. In the example above, stageN is still a CAF, and the evalation of the conduit stage1 =$= ... =$= stageN will cause that CAF to be evaluated and potentially retained in memory. CAFs are fine for conduits that are guaranteed to be small, or that loop back onto themselves; however, as discussed in section “Increasing sharing”, writing such conduit values is not an easy task, although it is manageable for simple conduits.

To avoid CAFs, conduis like stageN must be given a dummy argument and full laziness must be disabled for the module where stageN is defined. But it’s more subtle than that; even if a conduit does have real (non-dummy) arguments, part of that conduit might still be independent of those arguments and hence be floated to the top by the full laziness optimization, creating yet more unwanted CAF values. Full laziness must again be disabled to stop this from happening.

If you are sure that full laziness cannot float anything harmful to the top, you can leave it enabled; however, verifying that this is the case is highly non-trivial. You can of course test the code, but if you are unlucky the memory leak will only arise under certain specific usage conditions. Moreover, a small modification to the codebase, the libraries it uses, or even the compiler, perhaps years down the line, might change the program and reintroduce a memory leak.

Neil Mitchell has a more introductory level SkillsMatter talk Plugging Space Leaks, Improving Performance on space leaks and how to debug them. It doesn’t cover the kinds of leaks we discuss in this blog post however.

Addendum 1: ghc’s “state hack”

Let’s go back to the section about sinks; if you recall, we considered this example:

We explained how countChars 0 results in a chain of Await constructors and function closures. However, you might be wondering, why would this be retained at all? After all, feedFrom is just an ordinary function, albeit one that computes an IO action. Why shouldn’t the whole expression

feedFrom 10000000 (countChars 0)

just be reduced to a single print 10000000 action, leaving no trace of the pipe at all? Indeed, this is precisely what happens when we disable ghc’s “state hack”; if we compile this program with -fno-state-hack it runs in constant space.

So what is the state hack? You can think of it as the opposite of the full laziness transformation; where full laziness transforms

though only for arguments y of type State# <token>. In general this is not sound, of course, as it might duplicate work; hence, the name “state hack”. Joachim Breitner’s StackOverflow answer explains why this optimization is necessary; my own blog post Understanding the RealWorld provides more background.

Let’s leave aside the question of why this optimization exists, and consider the effect on the code above. If you ask ghc to dump the optimized core (-ddump-stg), and translate the result back to readable Haskell, you will realize that it boils down to a single line change. With the state hack disabled the last line of feedFrom is effectively:

feedFrom n (Await k) =IO$
unIO (feedFrom (n -1) (k (Right'A')))

where IO and unIO just wrap and unwrap the IO monad. But when the state hack is enabled (the default), this turns into

Note how this floats the recursive call to feedFrominto the lambda. This means that

feedFrom 10000000 (countChars 0)

no longer reduces to a single print statement (after an expensive computation); instead, it reduces immediately to a function closure, waiting for its world argument. It’s this function closure that retains the Await/function chain and hence causes the space leak.

Addendum 2: Interaction with cost-centres (SCC)

A final cautionary tale. Suppose we are studying a space leak, and so we are compiling our code with profiling enabled. At some point we add some cost centres, or use -fprof-auto perhaps, and suddenly find that the space leak disappeared! What gives?

Consider one last time the sink example. We can make the space leak disappear by adding a single cost centre:

Adding this cost centre effectively has the same result as specifying -fno-state-hack; with the cost centre present, the state hack can no longer float the computations into the lambda.

Footnotes

The ability to detect upstream termination is one of the characteristics that sets conduit apart from the pipes package, in which this is impossible (or at least hard to do). Personally, I consider this an essential feature. Note that the definition of Pipe in conduit takes an additional type argument to avoid insisting that the type of the upstream return value matches the type of the downstream return value. For simplicity I’ve omitted this additional type argument here.

Sinks and sources can also execute effects, of course; since we are interested in the memory behaviour of the indvidual constructors, we treat effects separately.

runPipe is (close to) the actual runPipe we would normally use; we connect pipes that await or yield into a single self contained pipe that does neither.

For these simple examples actually the optimizer can work its magic and the space leak doesn’t appear, unless evalStateC is declared NOINLINE. Again, for larger examples problems arise whether it’s inlined or not.

The original definition of retry used in this blogpost was

retry io = catch io (\(_ ::SomeException) -> retry io)

but as Eric Mertens rightly points out, this is not correct as catch runs the exception handler with exceptions masked. For the purposes of this blog post however the difference is not important; in fact, none of the examples in this blog post run the exception handler at all.

]]>Haskell eXchange, Hackathon, and Courseshttp://www.well-typed.com/blog/2016/09/haskell-exchange-hackathon-and-courses
andrescommunitywell-typedtraininghttp://www.well-typed.com/blog/2016/09/haskell-exchange-hackathon-and-coursesThu, 01 Sep 2016 13:04:54 GMTIn October 2016, we are co-organizing various events in London. Now is the time to register for:

the Haskell eXchange, a two-day three-track conference with a large number of Haskell-related talks and workshops on a wide variety of topics, including keynotes by Simon Peyton Jones, Don Stewart, Conor McBride and Graham Hutton;

the Haskell eXchange Hackathon, a two-day event for everyone who wants to get involved coding on projects related to the Haskell infrastructure, such as Hackage and Cabal;

our Haskell courses, including a two-day introductory course, a one-day course on type-level programming in GHC, and a two-day course on lazy evaluation and performance.

The Haskell eXchange is a general Haskell conference aimed at Haskell enthusiasts of all skill levels. The Haskell eXchange is organized annually, and 2016 is its fifth year. For the second year in a row, the venue will be Skills Matter’s CodeNode, where we have space for three parallel tracks. New this year: a large number of beginner-focused talks. At all times, at least one track will be available with a talk aimed at (relative) newcomers to Haskell. Of course, there are also plenty of talks on more advanced topics. The four keynote speakers are Simon Peyton Jones, Don Stewart, Conor McBride and Graham Hutton.

We are going to repeat the successful Haskell Infrastructure Hackathon that we organized last year directly after the Haskell eXchange. Once again, everyone who is already contributing to Haskell projects related to the Haskell infrastructure as well as everyone who wants to get involved and talk to active contributors is invited to spend two days hacking on various projects, such as Hackage and Cabal.

Registration is open. This event is free to attend (and you can attend independently of the Haskell eXchange), but there is limited space, so you have to register.

Haskell courses

This is a two-day general introduction to Haskell, aimed at developers who have experience with other (usually non-functional) programming languages, and want to learn about Haskell. Topics include defining basic datatypes and functions, the importance of type-driven design, abstraction via higher-order functions, handling effects (such as input/output) explicitly, and general programming patterns such as applicative functors and monads. This hands-on course includes several small exercises and programming assignments that allow to practice and feedback from the instructor during the course.

This one-day course focuses on several of the type-system-oriented language extensions that GHC offers and shows how to put them to good use. Topics include the kind system and promoting datatypes, GADTs, type families, moving even more towards dependent types via the new TypeInType. The extensions will be explained, illustrated with exampls, and we provide advice on how and when to best use them.

In this two-day course, we focus on how to write performant Haskell code that scales. We systematically explain how lazy evaluation works, and how one can reason about the time and space performance of code that is evaluated lazily. We look at various common pitfalls and explain them. We look at data structures and their performance characteristics and discuss their suitability for various tasks. We also discuss how one can best debug the performance of Haskell code, and look at existing high-performance Haskell libraries and their implementation to learn general techniques that can be reused.

This Hackathon is intended for everyone who is interested to write programs in Haskell, whether beginner or expert, whether hobbyist or professional.

In the tradition of other Haskell Hackathons such as ZuriHac, HacBerlin, UHac and many more, the plan is to bring together up to a hundred of Haskell enthusiasts to work together on any Haskell-related projects they like, to share experiences, and to learn new things.

Attendance is free of charge, but there is a limited capacity, so you must register!

We are going to set up a mentor program and special events for Haskell newcomers. So if you are a Haskell beginner, you are very much welcome! And if you’re an expert, we’d appreciate if you’d be willing to spend some of your time during the Hackathon mentoring newcomers. We will ask you about this during the registration process.

We’re also planning to have a number of keynote talks at the Hackathon. We’re going to announce these soon.

We are looking for several (probably two) Haskell experts to join our team at Well-Typed. This is a great opportunity for someone who is passionate about Haskell and who is keen to improve and promote Haskell in a professional context.

About Well-Typed

We are a team of top notch Haskell experts. Founded in 2008, we were the first company dedicated to promoting the mainstream commercial use of Haskell. To achieve this aim, we help companies that are using or moving to Haskell by providing a range of services including consulting, development, training, and support and improvement of the Haskell development tools. We work with a wide range of clients, from tiny startups to well-known multinationals. We have established a track record of technical excellence and satisfied customers.

Our company has a strong engineering culture. All our managers and decision makers are themselves Haskell developers. Most of us have an academic background and we are not afraid to apply proper computer science to customers’ problems, particularly the fruits of FP and PL research.

We are a self-funded company so we are not beholden to external investors and can concentrate on the interests of our clients, our staff and the Haskell community.

About the jobs

One of the roles is for a specific project with one of our clients, and requires work on-site in London. The other role is more general and not tied to a single specific project or task, and allows remote work.

Please indicate in your application whether on-site work in London is an option for you.

In general, work for Well-Typed could cover any of the projects and activities that we are involved in as a company. The work may involve:

working on GHC, libraries and tools;

Haskell application development;

working directly with clients to solve their problems;

teaching Haskell and developing training materials.

We try wherever possible to arrange tasks within our team to suit peoples’ preferences and to rotate to provide variety and interest.

Well-Typed has a variety of clients. For some we do proprietary Haskell development and consulting. For others, much of the work involves open-source development and cooperating with the rest of the Haskell community: the commercial, open-source and academic users.

Our ideal candidate has excellent knowledge of Haskell, whether from industry, academia or personal interest. Familiarity with other languages, low-level programming and good software engineering practices are also useful. Good organisation and ability to manage your own time and reliably meet deadlines is important. You should also have good communication skills. Being interested or having experience in teaching Haskell (or other technical topics) is a bonus. Experience of consulting or running a business is also a bonus. You are likely to have a bachelor’s degree or higher in computer science or a related field, although this isn’t a requirement.

Offer details

The offer is initially for one year full time, with the intention of a long term arrangement. For the remote role, living in England is not required. For the on-site role, you have to be allowed to work in England. We may be able to offer either employment or sub-contracting, depending on the jurisdiction in which you live.

If you are interested, please apply via info@well-typed.com. Tell us why you are interested and why you would be a good fit for Well-Typed, and attach your CV. Please indicate whether the on-site work in London is an option for you. Please also indicate how soon you might be able to start.

A queue is a datastructure that provides efficient—O(1)—operations to remove an element from the front of the queue and to insert an element at the rear of the queue. In this blog post we will discuss how we can take advantage of laziness to implement such queues in Haskell, both with amortised and with worst-case O(1) bounds.

The results in this blog post are not new, and can be found in Chris Okasaki’s book “Purely Functional Data Structures”. However, the implementation and presentation here is different from Okasaki’s. In particular, the technique we use for real-time datastructures is more explicit and should scale to datastructures other than queues more easily than Okasaki’s.

What is the complexity of head and snoc in this representation? Your first instinct might be to say that head has O(1) complexity (after all, it doesn’t do anything but a pattern match) and that snoc has O(n) complexity, because it needs to traverse the entire list before it can append the element.

However, Haskell is a lazy language. All that happens when we call snoc is that we create a thunk (a suspended computation), which happens in O(1) time. Consider adding the elements [1..5] into an empty queue, one at a time:

Now when we call head on the resulting queue, (++) needs to traverse this entire chain before it can find the first element; since that chain has O(n) length, the complexity of head is O(n).

Strict, Non-Persistent Queues

Thinking about complexity in a lazy setting can be confusing, so let’s first think about a spine strict queue. In order to define it, we will need a spine-strict list:

dataStrictList a =SNil|SCons a !(StrictList a)

A bang annotation here means each evaluating an SCons node to weak-head normal form (for instance by pattern matching on it) will also force its tail to weak head normal form, and hence the entire spine of the list; we cannot have an SCons node with a pointer to an unevaluated tail.

The definition of strict lists in hand, we can attempt our next queue implementation:

dataQueue1 a =Q1!Int!(StrictList a) !Int!(StrictList a)

Instead of using a single list, we split the queue into two parts: the front of the queue and the rear of the queue. The front of the queue will be stored in normal order, so that we can easily remove elements from the front of the queue; the rear of the queue will be stored in reverse order, so that we can also easily insert new elements at the end of the queue.

In addition, we also record the size of both lists. We will use this to enforce the following invariant:

Queue Invariant: The front of the queue cannot be shorter than the rear.

(Simpler invariants are also possible, but this invariant is the one we will need later so we will use it throughout this blogpost.)

When the invariant is violated, we restore it by moving the elements from the rear of the queue to the front; since the rear of the queue is stored in reverse order, but the front is not, the rear must be reversed:

Worst-Case versus Amortised Complexity

Since we don’t have to think about laziness, the complexity of this queue implementation is a bit easier to determine. Clearly, head is O(1), and both tail and snoc have worst case O(n) complexity because rev has O(n) complexity. However, consider what happens when we insert [1..7] into an empty queue:

Notice what happens: we only need to reverse n elements after having inserted n elements; we therefore say that the amortised complexity (the complexity averaged over all operations) of the reverse is in fact O(1)—with one proviso, as we shall see in the next section.

Amortisation versus Persistence

The analysis in the previous section conveniently overlooked one fact: since values are immutable in Haskell, nothing is stopping us from reusing a queue multiple times. For instance, if we started from

Q1 3 [1..3] 3 [6,5,4]

we might attempt to insert 7, then 8, then 9, and finally 10 into this (same) queue:

Notice that each of these single insertions incurs the full cost of a reverse. Thus, claiming an amortised O(1) complexity is only valid if we use the queue linearly (i.e., never reusing queues). If we want to lift this restriction, we need to take advantage of laziness.

Amortised Complexity for Persistent Queues

In order to get amortised constant time bounds even when the queue is not used linearly, we need to take advantage of lazy evaluation. We will change the front of the queue back to be a lazy list:

dataQueue2 a =Q2!Int [a] !Int!(StrictList a)

The remainder of the implementation is the same as it was for Queue1, except that reverse now needs to take a strict list as input and return a lazy list as result:

The genius of this representation lies in two facts. First, notice that when we construct the thunk (xs ++ rev' ys), we know that the rev' ys will not be forced until we have exhausted xs. Since we construct this thunk only when the rear is one longer than the front, we are indeed justified in saying that the cost of the reverse is amortised O(1).

But what about reusing the same queue twice? This is where we rely crucially on laziness. Suppose we have a sequence of operations

While it is true that we might call tail on this resulting queue any number of times, they will not each incur the full cost of rev': since these thunks will all be shared, laziness will make sure that once this rev' has been evaluated (“forced”) once, it will not be forced again.

Of course, if we started from that initial queue and inserted various elements, then each of those would create a separate (not shared) thunk with a call to rev': but those calls to rev' will only be forced if for each of those separate queues we first do f calls to tail (in this case, 4 calls).

From Amortised to Worst-Case Bounds

The queues from the previous section will suffice for lots of applications. However, in some applications amortised complexity bounds are not good enough. For instance, in real time systems having normally-cheap operations occassionally take a long time is not acceptable; each operation should take approximately the same amount of time, even if that means that the overall efficiency of the system is slightly lower.

There are two sources of delays in the implementation from the previous section. The first is that when we come across the call to reverse, that whole reverse needs to happen in one go. The second source comes from the fact that we might still chain calls to append; consider what happens when we insert the elements [1..7]:

This is similar to the behaviour we saw for the queues based on a single list, except we now have a maximum of O(log n) calls rather than O(n), because the distance between two calls to reverse doubles each time.

Intuitively, we can solve both of these problems by doing a little bit of the append and a little bit of the reverse each time we call tail or snoc. We need to reestablish the invariant when r = f + 1. At this point the append will take f steps, and the reverse r steps, and we will not need to reestablish the invariant again until we have added r + f + 2 elements to the rear of the queue (or added some to the rear and removed some from the front). This therefore gives us plenty of time to do the append and the reverse, if we take one step on each call to tail and snoc.

Progress

How might we “do one step of a reverse”? This is where we diverge from Okasaki, and give a more direct implementation of this idea. We can implement a datatype that describes the “progress” of an operation:

dataProgress=Done|NotYetProgress

The idea is that we can execute one step of an operation by pattern matching on an appropriate value of type Progress:

step ::Progress->Progress
step Done=Done
step (NotYet p) = p

For (++) it is easy to construct a Progress value which will execute the append; all we need to do is force (part of) the spine of the resulting list:

For other operations this is more difficult. We need some way to express a computation split into multiple steps. We can use the following datatype for this purpose:

dataDelay a =Now a |Later (Delay a)

Delay a is a computation of an a, but we mark the various steps of the computation using the Later constructor (this datatype is variously known as the delay monad or the partiality monad, but we will not need the fact that it is a monad in this blog post). For example, here is reverse:

We then need to be able to execute one step of such a computation. For this purpose we can introduce

runDelay ::Delay a -> (a, Progress)

which returns the final value, as well as a Progress value which allows us to execute the computation step by step. The definition of runDelay is somewhat difficult (see appendix, below), but the idea hopefully is clear: evaluating the resulting Progressn steps will execute precisely n steps of the computation; if you look at the resulting a value before having stepped the entire Progress the remainder of the computation will run at that point.

Finally, we can execute two operations in lockstep by pattern matching on two Progress values at the same time:

Real-Time Queues

We can use the Progress datatype to implement real-time queues: queues where both insertion and deletion has O(1) worst case complexity. The representation is much like we used in the previous section, but we add a Progress field (Progress is an example implementation of what Okasaki calls a “schedule”):

dataQueue3 a =Q3!Int [a] !Int!(StrictList a) !Progress

Re-establishing the invariant happens much as before, except that we record the resulting Progress on the queue:

Conclusions

It is difficult to develop data structures with amortised complexity bounds in strict but pure languages; laziness is essential for making sure that operations don’t unnecessarily get repeated. For applications where amortised bounds are insufficient, we can use an explicit schedule to make sure that operations get executed bit by bit; we can use this to develop a pure and persistent queue with O(1) insertion and deletion.

In his book, Okasaki does not introduce a Progress datatype or any of its related functionality; instead he makes very clever use of standard datatypes to get the same behaviour somehow implicitly. Although this is very elegant, it also requires a lot of ingenuity and does not immediately suggest how to apply the same techniques to other datatypes. The Progress datatype we use here is perhaps somewhat cruder, but it might make it easier to implement other real-time data structures.

Random access to (any of the variations on) the queue we implemented is still O(n); if you want a datastructure that provides O(1) insertion and deletion as well as O(log n) random access you could have a look at Data.Sequence; be aware however that this datatype provides amortised, not real-time bounds. Modifying Sequence to provide worst-case complexity bounds is left an exercise for the reader ;-)

Appendix: Implementation of runDelay

The definition of runDelay is tricky. The most elegant way we have found is to use the lazy ST monad:

In the lazy ST monad effects are only executed when their results are demanded, but are always executed in the same order. We take advantage of this to make sure that the calls to next only happen when pattern matching on the resulting Progress value. However, it is crucial that for the value of x we read the contents of the STRef only when the value of x is demanded, so that we can take advantage of any writes that next will have done in the meantime.

This does leave us with a proof obligation that this code is safe; in particular, that the value of x that we return does not depend on when we execute this readSTRef; in other words, that invoking next any number of times does not change this value. However, hopefully this is relatively easy to see. Indeed, it follows from parametricity: since runDelay is polymorphic in a, the only a it can return is the one that gets passed in.

To see that pattern matching on the resulting Progress has the intended effect, note that the ST ref starts with “cost n”, where n is the number of Later constructors, and note further that each call to next reduces n by one. Hence, by the time we reach Done, the computation has indeed been executed (reached the Now constructor).

Note that for the case of the queue implementation, by the time we demand the value of the reversed list, we are sure that we will have fully evaluated it, so the definition

runNow (Later d) = runNow d

could actually be replaced by

runNow (Later _) = error "something went horribly wrong!"

Indeed, this can be used to debug designing these real time data structures to ensure that things are indeed fully evaluated by the time you expect them to. In general however it makes the runDelay combinator somewhat less general, and strictly speaking it also breaks referential transparency because now the value of xdoes depend on how much of the Progress value you evaluate.

For more information about the (lazy) ST monad, see Lazy Functional State Threads, the original paper introducing it. Section 7.2, “Interleaved and parallel operations” discusses unsafeInterleaveST.