MemShrink progress, week 33

about:memory

Up until this week, about:memory was a static pages; once generated, it couldn’t change. This week, I landed a patch that makes every sub-tree expandable and collapsible. This can be quite useful, because it gives you fine control over what details to ignore. The following screenshot shows an example where all the level 1 sub-trees in the “explicit” tree are collapsed except for “startup-cache”. (++ indicates sub-trees that can be expanded, -- indicates sub-trees that can be collapsed, and the remaining nodes (such as “heap-unclassified”) are leaf nodes).

Memory Reporters

I added new memory reporters for style sheets. This was the single biggest remaining chunk of dark matter. On a 64-bit Linux build, this reduces the “heap-unclassified” count when I have Gmail and TechCrunch open from ~23% to ~15%. The counts mostly show up under the “explicit/dom+style” sub-tree, in “style-sheets” leaf nodes.

Add-ons

I mentioned last week that I think leaky add-ons are the #1 problem for Firefox’s memory consumption, and mentioned the idea of telling users when they have an add-on installed that is known to have performance problems. Bug 720856 is open for this. Asa Dotzler, Firefox Product Manager, said:

I will secure Firefox client developer resources for this feature where I have some input into resourcing. If this plan is deemed appropriate, I will work with Justin to secure AMO side resources as well and we can nail this problem.

We can’t keep going back and forth on this while our users suffer. We must act now. I understand that defining “bad add-ons” will be contentious but so long as the technical approach is righteous we can sort out how heavy handed we want to be on policy at a later time and move forward implementing this today.

There was already a feature page covering this basic idea, and Asa updated it. This feature would be a huge help to users. Fingers crossed we’ll see some progress soon!

Henrik Skupin has started working on an add-on called MemChaser (download it here) which is aimed at helping detect problems relating to memory consumption. Currently it just shows some stats in the add-on bar — resident memory consumption (updated every two seconds), how long the last garbage collection took, and how long the last cycle collection took — but Henrik has manyideas for the future. Worth watching!

46 Responses to MemShrink progress, week 33

Asa: “I understand that defining “bad add-ons” will be contentious but so long as the technical approach is righteous we can sort out how heavy handed we want to be on policy at a later time and move forward implementing this today.”

Based on the “bad add-ons” list published before, the policy decision is not really the important part of this. What matters most is the first set of measurements that gets published showing the list of bad add-ons.

After the first data was published before, some add-ons fixed their problems within a week or so, and a bunch of improvements were made to the measurements (which were quite flawed to start with). But having gone all over the tech media, the first-cut graph produced is still doing the rounds as data on which add-ons are “bad”, and incremental improvements are a non-story from a news point of view.

If at all possible, it would be good to communicate a bit more loudly to add-on authors before publishing a list, to give them an opportunity to fix things before the media tells the world their add-ons are bad.

The idea now is to have a manually curated list of known bad add-ons, rather than automatically generate data for all add-ons on AMO. So an add-on wouldn’t be labelled bad until a Mozilla developer had tested it and confirmed the problem. And then we’d give the developer a certain amount of time to fix the problem.

“We can’t keep going back and forth on this while our users suffer” – while I agree with the spirit, I hope that this doesn’t result in another wave of actionism. Protecting users is a great goal but add-on developers are also an important part of the ecosystem. Destroying the reputation of some top add-ons is the wrong way to make friends in the add-on developer community.

The current approach to ts measurements still has important issues (I filed bugs on a number of them) and this needs to be communicated clearly to the users. It’s especially important to say that comparing add-ons based on the “slowness” rating is absolutely pointless – the tests didn’t run at all for many add-ons, or failed for some reason, or resulted in a value that is far lower than it would be with a realistic configuration. What we have now is an indication that a particular add-on might be problematic (or not, outliers are very common in the Talos measurements). But it doesn’t mean that using a comparable add-on without such indication will solve the issue.

Finally, add-on authors need a way to check and verify the results. It’s nice to know that the hard numbers are exposed via AMO API – but I only learned about it today, by looking at a bug linked from the feature page. And even there I see a bunch of outdated numbers without the slightest indication that they are outdated (a bug commenter mentioned these exact numbers back in August, that’s how I know). As an add-on author I need to know: when was this measured, which version of my extension was it, where can I find the Talos log (yes, that log helped resolve a bunch of questions already).

And if AMO doesn’t do any plausibility checks on these numbers it should at least let add-on authors do it before going public with them…

See my response to Michael — if I have my way the list of bad add-ons will be manually curated, with no mass automated evaluations. And add-on authors will be informed well before their add-on is added to the “bad” list.

I am happy to report that with Firefox 10 on Mac the resident memory value now tracks closely the explicit value with the former usually being only 100 MB higher than the latter. To test this I ran the ROME demo [1] which is very resource intensive on Firefox. Explicit memory climbed to 1.5 GB and resident to 1.7. After closing the tab GC followed which lasted several minutes but afterwards the memory measurements had dropped to 300 MB and 400 MB respectively. Great work, this will make arguing that Firefox has competitive memory usage on Mac easier.

Nick & all the Memshrink devs, keep up the good work! Thanks for what you do.

I don’t know that I have a strong place to say this from, but would the visual indicators for expandable/collapsible not be better placed at the perpendicular line intersections (like the plus/minus signs in a file browser tree structure)?

I was waiting for someone to say that I tried several different things, and got agreement on this one, and it was a small change from what we had. So I figured it was good enough. Patches improving it more are welcome

I noticed it but just assumed it was a convention that differed between operating systems and that for a diagnostic/debug report making it consistent with native styles on multiple platforms wasn’t worth the amount of additional effort it would take.

It just copies what is visible. I think this is a good thing — if a user tried to copy a small chunk and got much more than they could see it would be very confusing.

As for giving incomplete reports, people already do that a lot, they copy just the part they think is relevant. Sometimes this is ok, sometimes it misses crucial stuff. So there’s no real change there.

I create a new profile, start it up with disable automatic updates, disabled all addsons/extensions, disable all plugins, disable hardware acceleration, set it to load a blank page, disable ‘submit performance data’.

Then I close it and start it back up with no-remote and type in about:memory and just leave it running. From the user perspective the browser should be doing nothing (my guess is it still does an occasional compacting of a SQLite database).

Occasionally during the day I refresh it a few times.

It will show that the numbers go up a, up, up after each refresh and drop down again. I assume it is some kind of garbage collection.

The lowest numbers starts out after startup as 25+ MB, shortly after it is 35+ (I assume it kept loading some stuff in the background) after a day it hardly gets below 50MB.

My comment probably sounded too negative – I very much appreciate your hard work. In fact, about:memory made finding issues a lot easier for me. In particular, with my restartless extensions I can now easily see how much memory they use (they always use only one compartment) and whether garbage collection brings it back to a constant level. Also, I can check whether that compartment goes away when the extension is disabled – if it doesn’t then something didn’t get unregistered properly, already found some issues using this approach.

For all those interested in a addon that ‘spams’ compartments, look at the ‘Cheevos’ Add-on.
About: memory?verbose shows 59 of them, on my 64bit Linux+ Nightly.13 this adds up to 8,452,568 Byte.

The cause: It’s build with the Jetpack-Add-on-Builder.

Handcrafted Add-ons like Wladimir’s Element_hiding_helper are much less wasteful with their resources (1 compartment: 596,976 Byte).

This shows: It’s not only a matter of knowledge, but also about the tools one uses. Jetpack CAN be OK, like Wladimir shows, but only if one is careful.

To be sarcastic: A clicky tool like Add-on Builder can build “bad” Add-ons easily, and gives no hints about the waste it lays on the users system. Neither during creation, nor during packaging.

The old adage about “Bad tools, bad product” becomes visible ones more.
“Add-on Builder” is not a bad tool per se, but it gives no feedback about later memory/cpu consumption.

So: How is “bad” defined and measured.?
My take would be:
1. Measure “warm” Fx (ts+res.mem) with no addons.
2. Install Add-on to measure, restart for not restartless Add-on.
3. Measure Fx (ts+res.mem) with Add-on.
4. Note rise in ts + res.mem as percentage and absolute including base numbers.
5.Give feedback to Add-on Dev. about review including these numbers.

And:
– Document the process to get these numbers along with the definitions for good and bad.
– Make these Docs available to Add-on Devs. (at least a link on a getting started page.)
– Fix “Add-on Builder” to give feedback about compartments, size, memory use

The requirements of the Add-on SDK are different – they need the scopes of the modules properly isolated, currently this only works if you have one compartment per module. On the other hand, the private framework I use for Element Hiding Helper and other extensions doesn’t pretend to be universal and using one compartment was more important to me than complete scope isolation. Anyway, a solution has been implemented in bug 677294 that will be available once Firefox 12 is released in 3 months – SDK extensions will be using only one compartment as well then (or maybe two, one for bootstrap.js and one for all their modules).

A bad add-on list is the wrong approach to this problem. Does that help add-ons that are good except for otherwise difficult-to-find leaks? Of course it doesn’t, which means it won’t help some of the most useful-yet-leaky add-ons out there. Yes, some add-ons may have obvious leaks that the deelopers could catch if they understood the gravity of the issue, but plenty of add-ons have leaks that the authors are desperate to find but can’t.

The developers of Firebug have been seeking more detailed info on where their leaks are for months now. A bad add-on list won’t tell them that, and it won’t tell users very much of value either. What they need is even more detailed memory reporting. There are some discussions ongoing about this I know, and it’s a longer-term project.

I understand that the bad add-on list is a sort of stopgap–but it’s not a very good one. At the very least it needs to distinguish between add-ons that are aware of certain leaks and trying to fix them, and those that aren’t. Leak severity also matters.

From a user’s point of view, what’s the difference between (a) an add-on that leaks and the author has tried to fix the leak and failed, and (b) an add-on that leaks and the author has not tried to fix it?

Distinguishing between mild leaks and severe leaks has merit, definitely.

The difference is that, having tolerated leaky add-ons for a long time already, and perhaps really needing the add-on in question because there’s no alternative, a user would feel better knowing that the add-on author is making an effort, might be even be inclined to help test or donate resources to see the add-on become leak-free. Last time I checked Mozilla was about a ‘community’ and keeping that community in the dark does nothing to foster it’s effectiveness and growth.

As a user of several add-ons (unlike, it seems, most Firefox developers – and hence how would they know what that experience is like?) for several years, I can tell you that the process of keeping your Firefox in a state that you prefer, is not easy. It’s like maintaining a toolset. You lose one, or another goes blunt, you need to find a replacement or sharpen up another one. Rather than sending emails to add-on authors and potentially receiving no feedback and not being able to find any evidence of life sometimes, it would be great to see Mozilla analyze and inform users about the state of add-ons which may have been abandoned to bit rot but only need a few lines fixed to work again, for example. Or which might need some testing to fix a leak.

Broadly speaking Mozilla needs to become more communicative with it’s users re Add-ons and their health from various perspectives (compat, security, author activity, etc). More communicative, not less!

If an add-on is no longer being actively maintained, or its author is apathetic toward leaks, obviously that’s a good thing to know. But lumping that in the same category as an add-on whose author really wants to find the leak but can’t due to a lack of the proper tools is a bad idea. Finding leaks is not an easy process as you well know, and not all add-on developers are going to be using tools like Valgrind. Even then the info they get might be inadequate to find the leak in what is basically some part of their JavaScript.

If a bad add-on is merely one that is known to have bad leaks, then Firebug would certainly qualify. And the fact that it would qualify means the list is useless, because Firebug is a crucial add-on for many developers–or even users trying to diagnose why some sites give them grief. Firebug’s developers are looking for the known leak, and lack the proper tools to do so. If the list is nuanced enough to say that it has a leak under certain circumstances, here are the steps that can mitigate it, and yes the authors are working on it, then I guess I can live with that–but it still doesn’t solve the problem so much as dissuade people from using an otherwise fantastic add-on.

In the end this list will punish some add-ons that deserve it for being leaky and not bothering to update, but it will also punish add-ons that are diligently working to improve and need Mozilla’s support to make the crucial leap.

If the author has tried and failed to find the leak, but their add-on is still slowing Firefox down, any warm and fuzzy feeling from knowing they’ve tried isn’t much help. A leak is a leak is a leak.

Also, the “bad add-on” label is a straw man. The exact mechanism and language is not yet set, but it won’t say “this add-on sucks, don’t use it”, it’ll say “this add-on causes the browser to use substantially more memory than usual”, or something like that. Users can use this information to decide if the add-on’s benefits outweigh its costs. In the specific case of Firebug, one option is to have two profiles, one with Firebug installed and one without. That way you can gain the benefits when you need to do debugging, but avoid the costs for general browsing.

Finally, we obviously want better tools and documentation to help find leaks, there’s no argument about that.

“If the author has tried and failed to find the leak, but their add-on is still slowing Firefox down, any warm and fuzzy feeling from knowing they’ve tried isn’t much help. A leak is a leak is a leak.”

I disagree. The same warm and fuzzy feeling that using Firefox is the right thing to do is something the *only thing* that stops me from switching to Chrome because as you say “a leak is a leak is a leak” or more to the point “Firefox is one big leak, be it a browser leak or an add-on leak”.

Nick you yourself have admitted that it took *six* ‘major’ versions or *two years* to make Firefox 4 as usable as Firefox 3.6. It’s been a *long* two years and many people haven’t lasted as long as I have! If I could be bothered tweaking Chrome as much as I need to tweak Firefox to make it the browser I want, I’d consider using Chrome!

Anyway, what’s the strategic merit of this post? Hmmm, in the end Add-on leakage must be stopped. However let’s not throw out the baby (loyal volunteer hard-working Add-on developers) with the bathwater.

No strawman intended; it’s merely that every discussion of the list has just talked about the fact that it’ll mention problematic add-ons, but I’ve seen little discussion to the effect that the information provided will have to be nuanced enough for what is ultimately a difficult call.

And while I agree it’s on add-on authors to clean up their add-ons, my bigger point here is that more attention needs to be paid to giving them the tools to do that. The only thing standing between Firebug and its fix is more detailed reporting, and that’s on Mozilla. But this seems to be given a lower priority than the idea of a list of leaky add-ons, and undoubtedly part of that is that the more detailed reporting is a longer-term, more difficult goal. In your presentation for instance the list got way better billing than the issue of helping authors actually find their leaks, and while I understand this from a pragmatic standpoint it just seems to me like too much attention is being given to the list, which is a Band-Aid, and not to the real solution.

Mostly I’m just frustrated because I rely on Firebug on a daily basis. Its zombie compartments are a source of annoyance, and I know that’s never going to change until the developers either stumble across the issue, or memory reporting advances to where they can finally pin it down. I’m not blaming anyone for that or trying to suggest no one’s moving fast enough, but I’d hate for more resources to be poured into the list than need be.

I’m worried that about:memory can be using a substantial amount of memory itself, esp. in the verbose view. For example, pdf.js uses data: urls which can be very long (hundreds of thousands of characters). It would be useful to truncate these to a sane length (1k chars or so).

The fact that I can run Gregor Wagner’s MemBench (I suspect it’s based on the original MemBuster page with more current sites and a more brutal loading schedule) on currently Nightlies on a 2GB machine with my dirty profile profile and not suffer neither extensive paging (have set memory.low_physical_memory_threshold_mb;64) or any zombie compartments afterwards is a great indicator that the Memshrink effort has produced great results. Older (<6) Firefox versions and Chrome Dev will eventually exhaust the address space and crash. The only thing that could improve is (non JS) heap fragmentation but that might not be easy to tackle.

Just browsing the new about:memory page to check the mentioned style information reporting, and was wondering if the duplications of those reports is deliberate (ie: they’re actually reporting different things). dom+style lists style-sheets under most inner-windows, while layout lists styledata under most shells, using a similar size for each. Most are 600k-700k each (though the dom+style ones appear to always be a little larger than the corresponding layout entry), so with 20-odd tabs open that could be an extra 14 MB or so.

Also, under dom+style, all the major subgroupings are titled something like “top=674 (inner=820)”, which is kind of confusing in identifying anything. It looks like the “inner=820″ is pointing at the id of the primary contained inner-window. Looks like something I could get used to, but also seems like something that could be better labeled.

The “style-sheets” reports and the “styledata” reporters are definitely measuring different things; I have a tool (DMD) that detects if any heap blocks are reported more than once. If the numbers are similar that’s just coincidence. The naming could be improved to help distinguish them.

Do you guys have a target for RAM usage? Should Firefox be usable on a system with only 128M RAM?

I have several obsolete computers running Arch Linux: 133 MHz Pentium netbook w/ 96M, 400 MHz Pentium II w/ 128M, a 350 MHz Pentium II w/ 192M, and 2 Pentium IIIs w/ 256M and crummy integrated graphics. Firefox 3.5 works slowly on the 96M machine (30 seconds to launch with a blank start screen), Firefox 3.6.8 is acceptable on the 192M machine. Firefox 10 is pretty much unusable on the 128M machine but works ok on the 256M machines. I can even watch Youtube video clips on the P3s at a frame rate of something like 2/second.

On the machine with 128M RAM, I tried LXDE and ended up dumping it to free up more RAM. Am now running just a window manager, jwm. I also tried btrfs on this machine, and that turned out to be a mistake. Seems Firefox does a lot of syncing, and btrfs performs very badly on sync. A process called btrfs-endio eats up a lot of CPU time. I installed Adblock and Noscript. Under Edit->Preferences->Content, I unchecked “Load images automatically” and “Enable JavaScript”. I deleted the news feed bookmarks. This almost makes Firefox usable. Still thrashes swap. Still often get popup messages about Firefox’s own scripts taking too long. Takes about a minute to bring up a site like news.google.com, but at least it comes up correctly.

Is there anything else to be done? (I could give up on this worthless computer.) Maybe find a small footprint theme? Change toolbars to text only?

Thanks for the data points, they are interesting. We don’t really have a minimum target.

I’ve done some low-memory stress testing which I will blog about soon. The fact that you say FF10 is unusable on 128MB but ok on 256MB is consistent what I found. I also found that Chrome tended to do better when memory is very low.

If you look at about:memory you’ll see that JavaScript often dominates, so disabling JavaScript is the single biggest thing you can do. (Note that JS will still be used for some in-browser stuff, but not for web content.) I’m not aware of much else you can do that’ll make a noticeable difference. Actually, I wonder if using a pre-release version such as Aurora 12 would help, since our trend since FF7 has been that memory consumption has been dropping. That’d be interesting to know.

These machines must be 15 years old or more? Again, we don’t have targets in terms of supporting old machines, but more than about 10 years old sounds to me like it’s pushing it :/

The P2 computers were bought in 1998. I got rid of the 128MB machine, but I’ll keep the 192MB one for a while. Mplayer actually works better on it than on the 1 GHz P3 because it has a Riva TNT (NV04, the very oldest card supported by the nouveau driver), while the P3 has integrated Intel graphics. Can’t do much with the 96MB Pentium based netbook (which came with Windows 98)– would need an i586 binary, and seems the world settled on i686 as the baseline. I was lucky to get Arch Linux installed on that while the archlinux-i586 site existed.

I tried the latest Aurora nightly on the 192MB machine, without Adblock, and with JS and images, and it worked okay. No thrashing. I have LXDE there. Best startup time I saw was 7 seconds for 3.6.8, and 8 seconds for Aurora. Since Aurora worked, I pushed harder and tried watching a video on Youtube. It played the sound fine, showed about 20 frames per minute (that’s minute, not second), and kept the CPU at 100%. Still no thrashing. The flash plugin crashed when the video ended. Mplayer works better on the P2, but the P3 wins the Firefox+flash test.

One other thing I’ve found helpful for improved performance on these older machines is turning off the hinting and antialiasing for the fonts. This of course makes most of the fonts look terrible. But fixed still looks good, and that’s what I use.

I’ve been using Mozilla Firefox 10 since I downloaded it. But it seems like Firefox 10 had no answer about how to reduce it memory hog. I tried about:config, just like any version above 3.6, but nothing happened. After 1 hour, my Firefox crash. I like Firefox, more than chrome. I believe in community, not a corporation like Google. But this is sad: a new release with an old issue: memory.

about:config didn’t work? Do you have any add-ons installed? I bet either you have a bad add-on or there’s some problem with your profile. Try restarting in Safe Mode (http://support.mozilla.org/en-US/kb/Safe%20Mode) which disables all add-ons and lets you restore settings to default. If that fixes the problem, then you know it’s an add-on or configuration that’s the problem, and you can try to narrow down the problem further, e.g. by selectively disabling add-ons one at a time.