Space leak zoo

A big thanks to everyone who everyone who sent in space leak specimens. All of the leaks have been inspected and cataloged by our experts, and we are quite pleased to open the doors of the space leak zoo to the public!

There are a few different types of space leak here, but they are quite different and a visitor would do well not to confuse them (the methods for handling them if encountered in the wild vary, and using the wrong technique could exacerbate the situation).

A memory leak is when a program is unable to release memory back to the operating system. It's a rare beast, since it has been mostly eliminated by modern garbage collection. We won’t see any examples of it in this series, though it is strictly possible for Haskell programs to exhibit this type of leak if they use non-bracketed low-level FFI allocation functions.

A strong reference leak is when a program keeps around a reference to memory that in principle could be used but actually will never be used anymore. A confluence of purity and laziness make these types of leaks uncommon in idiomatic Haskell programs. Purity sidesteps these leaks by discouraging the use of mutable references, which can leak memory if they are not cleared when appropriate. Laziness sidesteps these leaks by making it difficult to accidentally generate too much of a data structure when you only want parts of it: we just use less memory to begin with. Of course, using mutable references or strictness in Haskell can reintroduce these errors (sometimes you can fix instances of the former with weak references—thus the name, “strong reference” leak), and live variable leaks (described below) are a type of strong reference leak that catch people unfamiliar with closures by surprise.

A thunk leak is when a program builds up a lot of unevaluated thunks in memory which intrinsically take up a lot of space. It can be observed when a heap profile shows a large number of THUNK objects, or when you stack overflow from evaluating a chain of these thunks. These leaks thrive on lazy evaluation, and as such are relatively rare in the imperative world. You can fix them by introducing appropriate strictness.

A live variable leak is when some closure (either a thunk or a lambda) contains a reference to memory that a programmer expects to have been freed already. They arise because references to memory in thunks and functions tend to be implicit (via live variables) as opposed to explicit (as is the case for a data record.) In instances where the result of the function is substantially smaller, these leaks can be fixed by introducing strictness. However, these are not as easy to fix as thunk leaks, as you must ensure all of the references are dropped. Furthermore, the presence of a large chunk of evaluated memory may not necessarily indicate a live variable leak; rather, it may mean that streaming has failed. See below.

A streaming leak is when a program should only need a small amount of the input to produce a small amount of output, thus using only a small amount of memory, but it doesn’t. Instead, large amounts of the input are forced and kept in memory. These leaks thrive on laziness and intermediate data structures, but, unlike the case of thunk leaks, introducing strictness can exacerbate the situation. You can fix them by duplicating work and carefully tracking data dependencies.

A stack overflow is when a program builds up a lot of pending operations that need to performed after the current execution. It can be observed when your program runs out of stack space. It is, strictly speaking, not a space leak, but an improper fix to a thunk leak or a streaming leak can convert it into a stack overflow, so we include it here. (We also emphasize these are not the same as thunk leaks, although sometimes they look the same.) These leaks thrive on recursion. You can fix them by converting recursive code to iterative style, which can be tail-call optimized, or using a better data structure. Turning on optimization also tends to help.

A selector leak is a sub-species of a thunk leak when a thunk that only uses a subset of a record, but because it hasn't been evaluated it causes the entire record to be retained. These have mostly been killed off by the treatment of GHC’s selector thunks by the RTS, but they also occasionally show due to optimizations. (See below.)

An optimization induced leak is a camouflaged version of any of the leaks here, where the source code claims there is no leak, but an optimization by the compiler introduces the space leak. These are very tricky to identify; we would not put these in a petting zoo! (You can fix them by posting a bug to GHC Trac.)

A thread leak is when too many Haskell threads are lying around. You can identify this by a contingent of TSO on the heap profile: TSO stands for thread-state object. These are interesting to debug, because there are a variety of reasons why a thread may not dying.

In the next post, we’ll draw some pictures and give examples of each of these leaks. As an exercise, I invite interested readers to categorize the leaks we saw last time.

Update. I’ve separated thunk leaks from what I will now call “live variable leaks,” and re-clarified some other points, especially with respect to strong references. I’ll expand on what I think is the crucial conceptual difference between them in later posts.

18 Responses to “Space leak zoo”

“These leaks thrive on laziness and intermediate data structures, but, unlike the case of thunk leaks, thunks are good and should be preserved. You can fix them by duplicating work and carefully tracking data dependencies.”

I was expecting your hints about fixing stack overflows to mention thunks. The traditional example I’m aware of (i.e. main2 from the previous post) is essentially solved by the strictness annotation in foldl’.

Anonymous 1: Thunks are good not in the sense that it’s fine if they show up in the heap profile, but good in the sense that when we’re streaming data, we should only have the current bit of data we’re working with, and the thunk to produce the next bit. Hopefully the next post will make it a bit clearer.

Nick: Yes; I’d like to distinguish between thunk leaks and stack overflow leaks. I’ve clarified the text a little. You can stack overflow a strict functional language too, but thunk leaks are a special kind of stack overflow that can show up even if your code is written iteratively.

I am surprised that you didn’t mention loss of sharing of function application as being one, if not the most common, source of memory leaks. It’s also the entire reason for memoization to exist, just consider the general type for memo :: (a -> b) -> a -> b, and the many cases where memoization (in the form of using a memo function or knot tying) helps to seal leaks in recursive functions.

Memoization does not imply great space usage when used properly. The classic example:

fib n = case n of { 0 -> 0; 1 -> 1; _ -> fib (n – 1) + fib (n – 2) }

fib’ n = let l = 0 : 1 : zipWith (+) l (tail l) in l !! n

The above fib’ is less leaky than fib since it uses a list to memoize previous computation steps.

Not all leaks are as obvious as in fib, where work is apparently not shared. Spotting them may get really tricky in nested or mutual recusions.

On the other hand, not all leaks due to loss of sharing can be easily solved by memoization. As a shameless self-plug, see the paper Plugging a Space Leak with Arrows and Ode to Arrows for more examples.

Your example will have lots of stack usage. Also, fib’ will leak memory if you accidentally retain the whole list (which could happen with a full laziness transformation). This is why I want to avoid talking about stack overflows: while they can be related to space leaks, they really are different beasts.

GHC will hold onto memory for a little bit, even if it’s not being used, so that it doesn’t rapidly release and then re-request memory if the heap grows again. Eventually, it will release memory to the OS. This was not the case for Windows, but was fixed recently: http://hackage.haskell.org/trac/ghc/ticket/698

I’ve been screwing Erlang daily for about 4.5 months now, built in applications, compiler tool-chain, binary handling, and all, after stalking its profiles for over a year. After figuring out how an OTP application works, I took a break. I’ve been ch…