Justin Lebar enabled jemalloc on MacOS 10.7. This means that jemalloc is finally used on all supported versions of our major OSes: Windows, Mac, Linux and Android. Having a common heap allocator across these platforms is great for consistency of testing and behaviour, and makes future improvements involving jemalloc easier.

Other Bugs Fixed

I registered jemalloc with SQLite’s pluggable allocator interface. This had two benefits. First, it means that SQLite no longer needs to store the size of each allocation next to the allocation itself, avoiding some clownshoes allocations that wasted space. This reduces SQLite’s total memory usage by a few percent. Second, it makes the SQLite numbers in about:memory 100% accurate; previously SQLite was under-reporting its memory usage, sometimes significantly.

Relatedly, Marco Bonardo made three changes (here, here and here) that reduce the amount of memory used by the Places database.

Bug Counts

I accidentally deleted my record of the live bugs from last week, so I don’t have the +/- numbers for each priority this week.

P1: 29 (last week: 35)

P2: 126 (last week: 116)

P3: 59 (last week: 55)

Unprioritized: 0 (last week: 5)

The P1 result was great this week — six fewer than last week. Three of those were fixed, and three of those I downgraded to P2 because they’d been partially addressed.

For a longer view of things, here is a graph showing the MemShrink bug count since the project started in early June.

There was an early spike as many existing bugs were tagged with “MemShrink”, and a smaller spike in the middle when Marco Castellucio tagged a big pile of older bugs. Other than that, the count has marched steadily upward at the rate of about six per week. Many bugs are being fixed and definite improvements are being made, but this upward trend has been concerning me.

Future Directions

So in today’s MemShrink meeting we spent some time discussing future directions of MemShrink. Should we continue as is? Should we change our focus, e.g. by concentrating more on mobile, or setting some specific targets?

The discussion was long and wide-ranging and not easy to summarize. One topic was “what is the purpose of MemShrink?” The point being that memory usage is really a secondary measure. By and large, people don’t really care how many MB of memory Firefox is using; they care how responsive it is, and it’s just assumed that reducing memory usage will help with that. With that in mind, I’ll attempt to paraphrase and extrapolate some goals (apologies if I’ve misrepresented people’s opinions).

On 64-bit desktop, the primary goal is that Firefox’s performance should not degrade after using it heavily (e.g. many tabs) for a long time. This means it shouldn’t page excessively, and that operations like garbage collection and cycle collection shouldn’t get slower and slower.

On mobile, the primary goal probably is to reduce actual memory usage. This is because usage on mobile tends to be lighter (e.g. not many tabs) so the longer term issues are less important. However, Firefox will typically be killed by the OS if it takes up too much memory.

On 32-bit desktop, both goals are relevant.

As for how these goals would change our process, it’s not entirely clear. For desktop, it would be great to have a benchmark that simulates a lot of browsing (opening and closing many sites and interacting with them in non-trivial ways). At the end we could measure various things, such a memory usage, garbage and cycle collection time, and we could set targets to reduce those. For mobile, the current MemShrink process probably doesn’t need to change that much, though more profiling on mobile devices would be good.

Personally, I’ve been spreading myself thinly over a lot of MemShrink bugs. In particular, I try to push them along and not let them stall by doing things like trying to reproduce them, asking questions, etc. I’ve been feeling lately like it would be a better use of my time to do less of that and instead dig deeply into a particular area. I thought about working on making JS script compilation lazy, but now I’ve decided instead to focus primarily on improving the measurements in about:memory, in particular, reducing the size of “heap-unclassified” by improving existing memory reporters and adding new ones. I’ve decided this because it’s an area where I have expertise, clear ideas on how to make progress, and tools to help me. Plus it’s important; we can’t make improvements without measurements, and about:memory is the best memory measurement tool we have. Hopefully other people agree that this is important to work on

Marco Bonardo changed the way the places.sqlite database is handled. I’m reluctant to describe the change in much detail because I’ll probably get something wrong, and Marco told me he’s planning to write a blog post about it soon. So I’ll just quote from the bug: “Globally on my system (8GBs) I’ve often seen places.sqlite cache going over 100MB, with the patch I plan to force a maximum of 60MB (remember this will vary based on hardware specs), that is a >40% improvement. We may further reduce in future but better being on the safe side for now.” This was a MemShrink:P1 bug.

I rearranged nsCSSCompressedDataBlock to avoid some unnecessary padding on 64-bit platforms. This can save a megabyte or two if you have several CSS-heavy (e.g. Gmail) tabs open. It makes no difference on 32-bit platforms.

But it was a very busy week in terms of bug activity. Let’s look at the numbers.

P1: 29 (-2, +2)

P2: 76 (-10, +20)

P3: 38 (-1, +2)

Unprioritized: 22 (-5, +23)

Several things happened here.

Marco Castelluccio looked through old bugs and found a lot (30 or more) that were related to memory usage and tagged them with “MemShrink”.

Nine new bugs were filed to reduce about:memory’s “heap-unclassified” number by adding memory reporters; many of these were thanks to Boris Zbarsky’s insights into the output produced by DMD.

I closed out a number of bugs that were incomplete, stale, or finished; this included some of those newly marked by Marco, and some ones that were already tagged with “MemShrink”.

I tagged five leaks that were found with the cppcheck static analysis tool.

We spent the entire MemShrink meeting today triaging unprioritized bugs and we got through 23 of them. Of the remaining unprioritized bugs, the older ones tagged by Marco and the cppcheck ones (which I tagged after the meeting) constitute most of them.

It’s clear that the rate of problem/improvement identification is outstripping the rate of fixes. We have a standing agenda item in MemShrink meetings to go through Steve Fink’s ideas list, but we haven’t touched it in the past two meetings because we’ve spent the entire time on triage. And when we do go through that list, it will only result in more bugs being filed. I’m hoping that this glut of MemShrink-tagged bugs is temporary and the new bug rate will slow again in the coming weeks.

In the meantime, if you want to help, please look through the lists of open bugs, or contact me if you aren’t sure where to start, and I’ll do my best to find something you can work on. Thanks!

TL;DR: if you’re familiar with any Mozilla (C++ or JS) code that opens an async database connection, please go and check that it calls asyncClose() to close the connection; not doing so can lead to crashes and/or memory leaks.

Specifically, in Firefox 6, when such a connection is destroyed (because it’s explicitly deallocated or its refcount drops to zero in C++, or it’s garbage collected in JS) it’ll currently cause sporadic crashes in about:memory or telemetry code or Test Pilot code. This is because memory reporters end up pointing to connections that have been freed, and so when they are used they end up (innocently) accessing memory they shouldn’t.

I’ve opened bug 662989 to avoid the crashes, but even once it’s implemented, I think that if you fail to call asyncClose() it will still cause a memory leak, because sqlite3_close() is never called on the connection and so SQLite won’t free up memory used by the connection. Also, connections that needlessly remain open can have active locks that can starve out other connections or cause extreme growth of the write-ahead-log.

As I write this, thanks to Marco Bonardo, I’m aware of three places in JS code that fail to call asyncClose() when they should:

browser/components/preferences/aboutPermissions.js. Bug 654573 covers this, I’m about to land a patch to fix it.

browser/app/profile/extensions/testpilot@labs.mozilla.com/modules/experiment_data_store.js. Bug 662985 covers this. Note that Test Pilot seems to fail to use asyncClose() when it should, and it also uses memory reporters. So it’s unclear which of these two things is responsible for the Test Pilot crashes of this sort seen so far; in other words, Test Pilot may be a culprit or an innocent bystander, or both.

These were found via a crude MXR search. I haven’t looked for C++ code that makes this mistake.

If you only do synchronous transactions over your database connection, the connection will be cleaned up properly, even if you don’t call close(), when it is deallocated or garbage collected. However, it’s better to close the connection as soon as you can so that the resources (memory, locks) are freed immediately.

See this MDC page for more details about database connections and the relevant APIs. It appears that these APIs are somewhat error-prone. As it happens, an NS_ENSURE_TRUE macro will fail when an async connection is destroyed without having been asyncClose’d first, and this causes a warning message to be printed to stderr. Unfortunately, stderr is spammed with all sorts of warnings (something that was discussed in the comments of a previous post of mine), and so this warning gets lost among the noise. Addressing this problem is something I’ll write about shortly.

Thanks to Andrew Sutherland and Marco Bonardo for helping me greatly with the bugs mentioned above. Any errors in this post are mine!

UPDATE: I did some investigation and found that about:permissions leaked about 180KB of memory on my machine every time it failed to asyncClose a DB connection. This showed up under the explicit/storage/sqlite/other reporter in about:memory.