Saturday, 31 January 2009

Mono 2.2 still leaks memory

We have previously discussed the fact that Mono is still built upon a conservative garbage collector (Boehm's GC). This means that Mono is not capable of identifying exactly what data is reachable and, consequently, has to resort to conservative guesses that can fail to deallocate garbage, i.e. leaking memory.

Boehm's own literature describes situations where the GC might be expected to leak (lazy lists and queues) but claims that no case has even been found in practice and they could not even construct a contrived example where memory was actually leaked. Readers of our previous posts have stated that our claims of memory leaks are "bogus". So we decided to put this issue to rest.

The following trivial F# program creates a cyclic list representing a queue, adds one element and then repeatedly adds one element and removes it again:

This obviously requires only enough memory for at most two queue items, so any memory leaks will be obvious. Running this program on .NET, its memory consumption is steady at 11Mb. Running this program on Mono 2.2, the entire memory of the computer is leaked away in 60 seconds, the OS goes to swap and everything grinds to a halt.

We have also described situations where Mono 2.2 leaks stack space until the stack overflows. These results may be of interest to anyone else trying to find a usable VM to build upon.

25 comments:

One of the Mono developers, Rodrigo Kumpera, has responded to our example with the assertion that "it isquite rare to cause pathologicalleaks such as this one". However, we have written dozens of different programs based upon this queue implementation and they all leak in Mono.

It is rare because this sort of leak can *only* happen if pointers are retained in the stack. In normal applications, this rarely (if ever) happens.

If this exact code were used in a production application, it would not 'leak'.

Also, there was also a question as to whether or not the code shown does exactly as you think it does. You were asked to supply a C# version which does the exact same task so that it could be compared. Is there any chance you could provide that?[1]

[0]"This kind of leaks are usually caused by unused stack slots that retain the dead value.Regular code will overwrite those stack slots on method calls and let the GC collect."

[1] "If your description of the code is correct, it shouldn't "leak" even withthe Boehm GC. Write the equivalent code in C#, for example. My guess is that either your code doesn't do what you describe or there is a bug triggered by F# in the runtime and what is actually leaking is notmanaged memory"

The example I presented here was actually taken from a much larger multithreaded application that we discovered was leaking on Mono. I boiled it down to the example you see here and Mono leaked on every intermediate program that I created in the process.

I don't have time to translate the code into other languages myself to see if they also break Mono. Moreover, it is obviously not feasible for us to completely rewrite our F# code bases just to work around design flaws in Mono. So the results of translating even this tiny example would not be very interesting to us.

If you would like to have a go at recreating these bugs from your favorite language I recommend you start by decompiling this trivial F# program using reflector and then boiling the code down in your own language.

"Moreover, it is obviously not feasible for us to completely rewrite our F# code bases just to work around design flaws in Mono"

If a conservative garbage collector was a design flaw, then they wouldn't exist. Interoping with C or C++ requires at least a partially conservative collector. It *cannot* be done with a precise garbage collector.

"So the results of translating even this tiny example would not be very interesting to us."Yes it would, it would help find the actual cause of the bug. As was stated in your post to the mono list, that code should not leak managed memory. So that'd imply that it's not a garbage collection issue. It's something else.

If you attach a compiled version of your F# program to your original post to the mono list, that'd be perfect. I don't have an F# compiler, nor do i know if one is available for linux.

The existence of convervative GCs does not make them good design. Your statement that C/C++ interoperability requires a conservative collector is complete nonsense. OCaml, Haskell, Erlang, SML, .NET and the JVM are all obvious counter examples.

F# is freely available from here. Just install it and compile the code from this blog post.

As was stated in the bug report, the bug won't affect any real world applications (or at least very very very very few) because this bug is 100% entirely due to the fact that stack slots are not being overwritten, which is exactly what we said was the issue.

I've sent a new email on the mono list containing both the leaking and fixed versions and an explanation if you care to read it.

Actually the explanation given to me by Paolo "lupus" Molaro (author of Mono's JITs) turned out to be completely wrong:

"If your description of the code is correct, it shouldn't "leak" even with the Boehm GC. Write the equivalent code in C#, for example. My guess is that either your code doesn't do what you describe or there is a bug triggered by F# in the runtime and what is actually leaking is not managed memory. Post the equivalent C# code and we can easily check which case it is."

In reality, the code does exactly what I said it does, your C# repro proves that this is not an F#-specific problem and it really is leaking managed memory. The only possible conclusion is that this is another serious bug in Mono.

I can well believe your explanation but it leaves two serious problems. Firstly, there is no way for a programmer to tell what stack slots correspond to in F# source code even if they were willing to try to work around these bugs in Mono by hand. Secondly, the lavishness of your workaround really highlights the fact that this bug in Mono persists across many variations of this program. For example, if you make the queue global the bug still persists. If you split the push and pop operations into separate functions, the bug persists. Indeed, your workaround of injecting multiple redundant non-tail recursive calls interspersed with conditionals is the only alteration I have found that manages to evade the bug.

Take the code you presented on this list, make the queue a static variable in the class and factor the push and pop operations into separate functions and you will see that Mono still leaks memory even though there are now two additional functions with separate stack frames being called from the loop.

Boehm never pretended that his GC could be relied upon to reclaim memory automatically. So you cannot blame this leak on a bug in Boehm's GC because it is doing everything that it claims to do, i.e. nothing. This is precisely why choosing to use Boehm's GC was a fundamental design mistake for Mono.

I appreciate that you can set the pointer to null by hand if you know exactly what you are doing but that is not practical for anyone with an established code base that relies upon automatic memory management having been implemented correctly in the VM.

Alan, are you saying that this is not a bug in Mono? Even if the bug is because of a library that Mono is using, then it's still a bug that should be fixed.

I have seen this attitude towards bugs before. Mono implements lambda's in C# by translating all lambda's in a method into one class. The class contains all local variables that all the lambda's need, and methods for the bodies of the lambda. In a scope one object is created for all lambdas. So if you do this:

@Jules: The exact same 'leak' was just demonstrated to exist in MS.NET aswell. This issue is *not* limited to mono. In the boehm docs for "An Embarrassing Failure Scenario", MS.NET is hitting exactly the issue described in point 3.

As for the lambdas, yes, that could be an issue. I remember a discussion happening about this before, but i don't recall the outcome of it. Filing a bug report on it would be the best way to ensure that it is resolved.

Note that I tested this code compiled with Microsoft Visual C# 2008 Express on .NET 3.5 x86 and it does not leak as Alan claims. I also tested the assembly generated by Mono's C# compiler and that also leaks on Mono and not on .NET.

Also, your recommendation of nulling out references by hand as a workaround does not work in this case because the bogus pointer Mono is leaving in the stack frame of the "Main" function does not correspond to any variables in the source code of the C# program, i.e. it must be a temporary. Consequently, the programmer could not even nullify the pointer if they wanted to.

Memory leak stopped. Comment out the 'next.next = null' and you'll leak again. As per documented workaround, this results in a *single* object being retained unnecessarily rather than an entire chain of objects.