Sunday, October 22, 2017

TenFourFox FPR4b1 available

I didn't get everything into this release that I was hoping to; CSS Grid and some additional DOM features are going to have to wait until FPR5. Still, there's quite a bit in FPR4, including more AltiVec conversions (this time the library function we're making over is strchr()), layout speed enhancements and hopefully a final fix for issue 72. That was a particularly interesting fix because it turns out there are actually two OS bugs in 10.5 that not only caused the issue but made it a little more involved to mitigate; read the issue if you're interested in the gory technical details, but basically we can overwhelm Leopard with our popup window events, and on top of that we can't even detect the misplaced clicks that result because the NSEvent's underlying CGEvent has incorrectly displaced coordinates. Since it does much the same work to patch around the OS as the fix for issue 248 (which also affects 10.4), even though the two issues have completely different root causes, I mostly combined the code for the two fixes to simplify the situation. It's not well tested, however, so I haven't uploaded it to the tree yet in case I have to back it out like I did the last time. Once we've determined it fixes the problem and it doesn't regress anything, I'll commit and push it.

The two major user-facing changes relate to fonts and HTML5 video. On the font side, we now have the same versions of the Brotli, OTS, WOFF2 and Harfbuzz libraries as Firefox 57, meaning we now support the latest iteration of WOFF2 webfonts as well and pick up all the rendering and performance improvements along the way. (This also upgrades Brotli decompression for the websites that support it, and I added some PowerPC-specific enhancements to the in-tree Brotli to use our native assembly byteswapping instructions for endian conversion. I should try to push this upstream when I get a round tuit.) This version of TenFourFox also fixes a longstanding issue where we couldn't display Graphite fonts for minority writing systems; they just wouldn't load due to a compiler issue where one of the key structs was computed with the wrong size, causing the browser to bail out. Before you upgrade, look at that link in FPR3 and note that because of this fallback the Burmese Padauk font has the wrong washwes and the Nastaʿlīq font at the bottom is missing all the ligatures and glyph substitutions shown in the comparison screenshot. In FPR4, this is all corrected and everything appears perfectly. As a formally trained linguist (BA, University of California) and a Christian, I find the work SIL International is doing with writing systems to be fascinating and hopefully this will make TenFourFox more useful to our users in foreign climes.

On the video side, the YouTube redesign has been an unmitigated dumpster fire for performance on older machines. Not only does it require a lot more system resources, it also ruined a lot of older addons to download videos that depended on the prior layout (on purpose?). It's not entirely misguided, though: while the lazy loader they appear to be using makes it very hard to deterministically reason about what loads when, after the first video finally grinds through subsequent ones do require much less work. (This is part of Google's attempt to get you to just "leave YouTube on" like your TV, I suspect.) I tried to retune the media decoder state machine to deal with these changes, and the baseline I hit on makes the browser pre-render a lot more frames (not just buffer, but actually pre-decode prior to playback) and pushes much smaller sets to the compositor instead of drowning it in frames that arrive too late and then have to be taken back out. With this change my Quad G5 is able to play most videos in Reduced mode nearly as well as before -- it does not completely erase the loss in performance, but it does improve.

This retuning also benefits HTML5 video playback in general, not just on YouTube. You can see the difference on other WebM and Theora videos, such as the ones on Mozilla's own pages, or Wikipedia (WebM VP8 example, Theora VP3 example) — although there is an initial delay while the video pre-decodes, playback should be a fair bit less choppy. Even full-screen playback is no longer "LOL" in theory, though in practice still probably more stuttery than people would like. The same general limitations apply as before; for example, my Quad G5 handles VP9 with MSE fine, but my 10.5 DLSD PowerBook G4 becomes a slideshow due to VP9's higher bitrate and strongly prefers VP8. As such, the default setting is still to disable MSE, and I discourage enabling it except on low-spec G4 systems near the 1.25GHz cutoff (to use the lower 144p and 240p resolutions) and high-end 2.5GHz/2.7GHz G5 systems (to use the 360p and 480p options if desired).

FPR4 also introduces an experimental (disabled by default) set of features specifically for YouTube but possibly beneficial elsewhere, namely decode delay and Mach monitoring. Decode delay adds a "wait state" between page load and video playback so that the rest of the page can (eventually) load and the video won't get stomped on by other page display tasks requiring the CPU. In a similar fashion, Mach monitoring looks at the kernel-provided Mach factor at various intervals and if not enough CPU resources are available, inserts a "wait state" right then and there to temporarily delay playback until the CPU load goes down.

The reason these aren't enabled is because 1) I'm not sure what the proper values should be, or what a reasonable default is, and 2) longer values can cause some issues on YouTube with very short clips (particularly the interstitial ads) because their code doesn't expect the browser to suddenly take a timeout during playback. When this happens and an ad won't play, you probably can get around it by reloading the page. But you can still play with these settings and see what works for you. Post your findings in the comments along with your system specs, speed, RAM, etc. NB: You may need to restart the browser for some of these settings to stick, as they are cached for performance.

To introduce a decode delay, create a new integer preference tenfourfox.media.decode_delay in about:config and set the number of seconds you want. If you say zero (0), or delete the preference, there is no decode delay (the default). Every video played will have the decode delay, but only once upon initial playback. The idea with YouTube is a nice long decode delay to let all the other crap lazy-load, and then your video can queue up in peace.

Mach monitoring is based on Mach factor: the lower the factor, the more load is on the system (the reverse of load average in concept); zero, then, means all cores are 100% occupied. The default is a critical Mach factor of 450 (tenfourfox.media.mach_factor_min), a delay of five (5) seconds (tenfourfox.media.mach_factor_delay), and zero (0) maximum tries (tenfourfox.media.mach_factor_max_tries) which essentially disables the feature. If the preferences do not exist (the default), these defaults are used, meaning monitoring is not in effect. At various times the state machine will sample the Mach factor for the entire computer. If the Mach factor is less than the critical point, such as when the browser is trying to load YouTube comments, a playback delay is introduced (note that a delay of zero may still cause the browser to buffer video without an explicit delay, so this is not the same thing as disabling the feature entirely). The browser will only do this up to the maximum number of tries per video to prevent playback thrashing. Systems that are at their limit decoding video or very busy otherwise will likely need the Mach factor set rather low or the browser will blow through all the tries back to back before it even plays a single frame. Likewise, more maximum tries rather than longer delays may reduce problems with short clips but can cause irritating stalls later on; you'll have to find the balance that works for you. A tool like iStat or MenuMeters can give you an idea about how much processing headroom your system has.

Finally, this version removes the "Get Add-ons" tab from the Add-ons Manager, as threatened promised. Since the future is WebExtensions, and TenFourFox isn't compatible, there's no point in advertising them to our userbase. You can still download older legacy addons from AMO; I do still support them (remember: "best effort" only), and they will still install. I may resurrect this tab if someone(tm) develops a site to host these old addons.

For FPR5 my plan is to expand the use of our VMX-accelerated strchr() to more places, add CSS Grid, add some additional DOM features, and maybe start work on date and time pickers. The other major change I'd like to make is an overhaul of the session store system. The argument is that session stores run too frequently and chew up SSDs by writing the state of the browser to disk too often. As a fellow SSD user (a Samsung 512GB 850 PRO) I agree with this concern up to a point (which is why we have a 25-second interval instead of the default 15-second interval used in Firefox), but I think it's more profitable to reduce the size of the writes instead of making the interval excessively long: our systems aren't getting any younger and some of the instability we get reports on turned out to be undiagnosed system conflicts or even failing hardware. If we had a very long interval, it's possible these people might have lost data. The session store, like any backup method, only works if you use it!

Like everything else, you can tune this to your taste (though don't come crying to me if you muck it up). However, I think a reasonable first pass would be to do some code cleanup and then reduce the amount of history that gets serialized by reducing the number of back and forward documents to 5 (currently 10 and no limit respectively), and automatically purging closed windows and tabs after a certain timeframe, maybe a couple hours (see issue 444 for the relevant prefs). Making the interval 30 seconds instead of 25 shouldn't be a big loss either. But if you have other ideas, feel free to post them in the comments.

I haven't tried playing with the Mach monitoring settings since I don't have the data bandwidth to experiment with videos right now. However: I use 30 seconds for tenfourfox.media.decode_delay and it works nicely to let the YouTube site load before the video starts to play. I did actually have a click-to-play add-on installed for exactly this purpose, but it stopped working a while ago and never worked reliably, so this pref is definitely a win.