Quantum Flow Engineering Newsletter #19

Hi everyone,
As usual, I have some quick updates to share about what we've been up to on
improving the performance of the browser in the past week or so. Let's
first look at our progress on the Speedometer benchmark. Our performance
goal for Firefox 57 was to get within 20% of Chrome's benchmark score on
our Acer reference hardware <https://www.amazon.com/dp/B01K1IO3QW> on
Win64. Those of you who watch the Firefox Health Dashboards
<https://health.graphics/quantum/> every once in a while may have noticed
that now we are well within that target:
[image: Speedometer Progress Chart from the Firefox Health Dashboard,
within 14.86% of Chrome's benchmark score]
<http://ehsanakhgari.org/wp-content/uploads/2017/08/Screenshot-2017-8-10-Fi=
refox-Health-Dashboards.png>
It's nice to see the smiley face on this chart, finally! You can see the
more detailed downward slope on the AWFY graph that shows the progress
<https://arewefastyet.com/#machine=3D36&view=3Dsingle&suite=3Dspeedometer-m=
isc&subtest=3Dscore>
in the past couple of weeks or so (dark red dots are PGO builds, orange
dots are non-PGO builds, and of course green in Chrome):
[image: Detailed Speedometer progress in the past couple of weeks on Win64
(Acer reference hardware)]
<http://ehsanakhgari.org/wp-content/uploads/2017/08/Screenshot-2017-8-10-AR=
E-WE-FAST-YET.png>The
situation on Win32 is a bit worse, due to Chrome's recent switch
<https://groups.google.com/a/chromium.org/forum/#!msg/chromium-dev/Y3OEIKkd=
lu0/TCcT1SvwAwAJ>
to use clang-cl <https://clang.llvm.org/docs/MSVCCompatibility.html> on
Windows instead of MSVC which gave them an around 30% speed boost on the
32-bit Speedometer score, but we have made progress nonetheless. Such is
the nature of tracking moving targets!
[image: Speedometer progress chart on Win32]
<http://ehsanakhgari.org/wp-content/uploads/2017/08/Screenshot-2017-8-10-AR=
E-WE-FAST-YET1.png>The
other performance aspect to have a look at again is our progress at
eliminating slow synchronous IPC calls. I last wrote about this about thre=
e
weeks ago
<https://ehsanakhgari.org/blog/2017-07-21/quantum-flow-engineering-newslett=
er-16>,
and since then at least one major change happened: the infamous
document.cookie synchronous IPC
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1331680> call was eliminate=
d,
so I figured it may be a good time to look at the data
<https://docs.google.com/spreadsheets/d/1x_BWVlnQPg0DHbsrvPFX7g89lnFGa3lAIH=
WD_pLa_dE/edit#gid=3D1608422555&fvid=3D1493325006>
again.
[image: Sync IPC Analysis for 2017-08-10]
<http://ehsanakhgari.org/wp-content/uploads/2017/08/Screenshot-2017-8-10-Sy=
nc-IPC-Analysis.png>Telemetry
data is laggy since it includes data from older versions of Nightly, but if
you compare this to the previous chart
<http://ehsanakhgari.org/wp-content/uploads/2017/07/Screenshot-from-2017-07=
-20-20-41-37.png>,
there should be a stark difference visible:
PCookieService::Msg_GetCookieString is now a much smaller part of the
overall data (at around 26.1%). Looking at the list of the top ten
messages, the next ones in order are the usual suspects for those who have
followed these newsletters for a while: some JS initiated IPC,
PAPZCTreeManager::Msg_ReceiveMouseInputEvent, followed by more JS IPC,
followed by PBrowser::Msg_NotifyIMEFocus
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1349255>, followed by even
more JS IPC, followed by 2 new messages that are now surfacing as we've
fixed the worst ones of these: PDocAccessible::Msg_SyncTextChangeEvent
which is related to accessibility and the data shows it affects a
relatively small number of sessions due to its low submission rate, and
PContent::Msg_ClassifyLocal, which probably comes from turning the Flash
plugin click-to-play by default
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1345058>.
Now let's look at the breakdown of synchronous IPC messages initiated from
JS:
[image: JS Sync IPC Analysis for 2017-08-10]
<http://ehsanakhgari.org/wp-content/uploads/2017/08/Screenshot-2017-8-10-Sy=
nc-IPC-Analysis1.png>
The story here remains unchanged: most of the sync IPC messages we're
seeing come from legacy extensions, and there is also the contextmenu sync
IPC <https://bugzilla.mozilla.org/show_bug.cgi?id=3D1360406>, which has a
patch pending review. However, the picture here may start changing quite
soon. You may have seen the recent announcement
<https://mail.mozilla.org/pipermail/dev-addons/2017-August/003059.html>
about legacy extensions being disabled on Nightly starting from tomorrow,
so hopefully this data (and the C++ sync IPC data) will soon start to shift
to reflect more of the performance characteristics that our users on the
release channel will experience for Firefox 57.
Now please let me to acknowledge the great work
- Ting-Yu Chou removed some needless copying from SpiderMonkey
HashTable::lookupForAdd()
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1385181>.
- Jim Chen fixed a hang during text selection
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1383242> which happened =
as
a result of a recent regression.
- Marco Castelluccio made it so that we don=E2=80=99t use Preferences.js=
m at all
before first paint <https://bugzilla.mozilla.org/show_bug.cgi?id=3D13575=
17>,
which should help improve first paint time, and gets us closer to removi=
ng
that module entirely.
- Marco Bonardo improved the performance of loading the preferences used
by UnifiedComplete.js
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1371611>.
- Kirk Steuber made it so that we preload and cache strings for DOM
errors <https://bugzilla.mozilla.org/show_bug.cgi?id=3D1377377> when
idle. He also moved the handling of the Browser:Thumbnail:CheckState
message to the idle queue
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1376511>.
- Paolo Amadini reduced Promise overhead in DownloadLegacy.js progress
events <https://bugzilla.mozilla.org/show_bug.cgi?id=3D1382899>, which
used to slow down file downloads in some situations.
- Adam Gashlin created a Windows background thread for kicking off a
readahead for a few DLLs that can take a significant amount of time to l=
oad
on the main thread during startup
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1367416>.
- David Keeler made us load the loadable roots PKCS#11 module
asynchronously on a background thread during startup
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1372656>. This module
provides the built-in CA root store, and as such is only needed when
issuing a certificate or querying the trust of a certificate, which is
hopefully something that startup doesn=E2=80=99t need to be blocked on f=
or most
users.
- Doug Thayer moved the generation of exponential telemetry histogram
buckets from startup to compile-time
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1383210>. The computati=
on
of these buckets involved some math-heavy code that showed up in profile=
s,
and such code is better run on our build machines and not on our users=
=E2=80=99
machines! Furthermore, Doug ensured early calls to
setExperimentActive() during telemetry initialization don=E2=80=99t have
undesirable side effects
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1385396> such as forcing
initialization of our graphics stack.
- Ryan Hunt enabled asynchronous keyboard scrolling on pages which
register passive event listeners behind a pref
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1385071>, as a way to
allow web pages to assist the browser in enabling asynchronous keyboard
scrolling if the page=E2=80=99s event listener doesn=E2=80=99t need to c=
all
preventDefault() in their event listener code. See the documentation
<https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/addEventLi=
stener#Improving_scrolling_performance_with_passive_listeners>
for how this similar idea is used to improve the performance of touch ba=
sed
scrolling if the web page cooperates with the browser.
- Jan de Mooij ensured we don=E2=80=99t interrupt regex JIT code for non=
-urgent
interrupts <https://bugzilla.mozilla.org/show_bug.cgi?id=3D1386199>
arriving from background threads such as the IonMonkey compilation threa=
d.
He also inlined the constructor and destructor of AutoGeckoProfilerEntry
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1386555> and removed som=
e
debug-only code from them.
- Perry Jiang made it so that we attempt to capture page thumbnails off
of an idle callback
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1353584> when the browse=
r
is less likely to be busy.
- Henry Chang moved the main-thread portion of HTML parsing on behalf of
background tabs to happen within idle periods
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1355746> when possible.
Previously this would run asynchronously off of a 50ms timer.
- Jonathan Kew used a flag to ensure that expensive property accesses on
text nodes when their character data is modified only happens when neede=
d
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1385395>.
- D=C3=A3o Gottwald ensured we use instant scroll behavior when doing pi=
xel
scrolling <https://bugzilla.mozilla.org/show_bug.cgi?id=3D1387084>. Thi=
s
fixed a regression from last week=E2=80=99s landing of using smooth scro=
lling to
scroll the tab bar <https://bugzilla.mozilla.org/show_bug.cgi?id=3D13567=
05>
.
- Alexandre Poirot made sure we only forward the console API calls to
the parent process when the web console (or browser console) is open
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1382968>. This avoids t=
he
overhead of forwarding these calls when their result is completely
invisible.
- Felipe Gomes landed a large set of patches to move various
initialization tasks to the idle queue
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1388145> instead of
running them off of random timeouts.
- Kannan Vijayan optimized Array.prototype.join for empty and
single-item arrays <https://bugzilla.mozilla.org/show_bug.cgi?id=3D13828=
37>
.
- Andr=C3=A9 Bargull optimized GetElemBaseForLambda
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1387400>.
- Jan Varga moved the localStorage API to use the PBackground protocol
instead of PContent
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1350637>. This is an
important optimization to speed up preloading of the localStorage data
(which is a synchronous API) in the content process upon page load.
- Kris Maglione removed the old unused add-on SDK modules
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1350646>. These modules
which were used in some legacy extensions were the source of various
performance issues.
- Jim Chen ensured that we properly compare node to traversal range
under different modes
<https://bugzilla.mozilla.org/show_bug.cgi?id=3D1383242>. This fixed a
severe performance regression which could render a tab unresponsive by
getting Gecko into an infinite loop.of everyone who helped make Firefox
faster last week. I hope I'm not forgetting any names by mistake!
Cheers,
--=20
Ehsan