OS X and virtual “bloat”

There’s a lot of work going on these days to improve Mozilla’s memory usage, and it’s a complicated issue with different facets. When discussing this with users, one thing that sometimes comes up is the difference between a process’s working set, and its total virtual memory size. To simplify grossly, the working set is often the more important number, as it’s the amount of physical memory actually being used. A process could have gigabytes of virtual memory assigned to it without any measurable performance impact to the system, as long as the working set stays small. I’m skimming over a lot of details, but the point is that a large virtual memory size may or may not be a practical problem.

I’ve noticed that on OS X, in particular, the amount of virtual memory a process is using seems to be a rather strange value. Here’s a few lines of output from the “top” command on my MacBook. Note the rightmost VSIZE column (total address space allocated) and the RSIZE column (the resident size, or working set) next to it. You can also use the OS X “Activity Monitor” tool, which reports the same numbers as “Real Memory” and “Virtual Memory”.

Gosh, there’s Firefox with 542MB of virtual memory. I’ve been browsing a while with lots of tabs, so maybe I shouldn’t expect it to be tiny. Then again, starting it with a blank page results in just a 39MB RSIZE, but VSIZE is still over 540MB. Look at iCal and Colloquy (an IRC client), which both weigh in around 400MB… Hmm, that seems like a lot. Quite a few other processes are also in the 350MB ballpark; in fact, top reports a total of over 10GB of virtual memory assigned on my system. And, hmmmmmm, even standard Unix programs like bash and ntpd are grabbing 27MB of VM — what’s going on?

OS X has a nifty little utility called vmmap that lets you see exactly what’s consuming address space in a process. The full output is rather verbose, but it has a summary too:

That’s the summary for the “27MB” bash process. It looks like 8MB is reserved for the stack, 18.5MB is reserved for the “DefaultMallocZone”, and about 2.5MB (__TEXT) is code and static data. [The full listing shows that the bash code is only about 500K, the rest of the 2.5MB is all system libraries.] Another nifty OS X utility, heap, confirms that only 85K of that 18.5MB malloc area is actually being used. So, the conclusion here is that most of the alarming 27MB of bash‘s VM size is just unused address space (which is dirt cheap) and default system stuff. The amount of memory usage directly attributable to bash is really quite small. Smaller, in fact, than the 784K working set top reports.

So, now the $542,000,000 question… What’s up with Mozilla’s virtual memory size? (after the jump, to avoid annoying planet.mozilla.org readers!)

vmmap dumps out over a thousand lines of data for the firefox-bin process like:

The heap utility reports that 29% (17MB) of the 96MB of malloc space is unused.

I think the most surprising thing about these numbers is just how much VM space is consumed by system stuff: 256MB (!) for IOKit, 64MB for fonts, and 13MB for Carbon/CoreGraphics. [A peek at the vmmap for Colloquy, iCal, and Terminal show basically the same thing.] That, plus the unused heap space accounts for 66% of the 534MB address space. [I’m really curious what the 256MB IOKit item is. Maybe graphics memory mapped into the process? Anyone know?]

So, what do the numbers say about performance and bloat? Well, that’s hard to say. But I hope it is clearer that there’s more to a program’s memory usage than just a single, simple number.

IIRC we link IOKit into widget just for something silly like the idle service. If it’s costing us a lot, maybe we should reconsider?

Shouldn’t most of these frameworks (which by and large are just collections of shared libraries) be in shared memory anyway? Or is this the allocation of private memory every user of these shared libraries need to put up with?

Yikes, I can’t believe I overlooked the “-resident” flag for vmmap! Very useful. It does seem to confirm my suspicion that most of the biggest VM regions are not being used. Basically what one would expect, but now with numbers to prove it.

I deliberately avoided getting unto examining malloc usage; that’s a topic for some other blog post! Rather, the interesting data point here is how much *isn’t* being used. I’ve been planning to poke around with libumem on Solaris, and it would be interesting to compare its capabilities with libMallocDebug and leaks(1)… in a different blog post. :-)

Håkan:

Yeah, IOKit is what originally motived me to post this, as it looked like an interesting example of how “cost” can mean different things in different contexts. It felt unlikely to be a serious problem (I think we would have noticed if OS X was thrashing around an extra 256MB, compared to Linux/Windows), not to mention the “vmmap -resident” data. But still, it does have some sort of cost, as measured by the impression users get if they look at the VM size and think Firefox is using gobs of memory.

Eliminating or pruning IOKit would reduce that “cost”… But it’s sort of bullshit performance tuning (because the only effect is in someone’s head, and I think we’re far more interested in real gains). Then again, a big reduction in a fairly visible and confusing number like VSIZE might we worth taking if it was easy. I’d first want to understand it better.

Finally (phew!), yes, most of the framework and system stuff should be shared data, and so the real cost of having it in physical memory is spread across all the processes sharing it. Yet another reason why a single number like VSIZE doesn’t tell the whole story!

After I played a long time with mallocdebug/leaks and Mac OS X debugging stuff, I can say that those are really good tools.
– leaks can operate at runtime (when using MallocStackLogging, you can get the stack trace of allocation)
– the same for mallocdebug and it can tell you what memory blocks have been added between two moments.
– you can also have a look at objectalloc which is also useful for malloc besides being useful for objective-C object allocations. This may be a better user interface than mallocdebug