Status

()

The Mozilla Toolkit is a set of APIs, built on top of Gecko, which provide advanced services to XUL applications. These services include Profile Management, Chrome Registration, Browsing History, Extension and Theme Management, Application Update Service, and Safe Mode. (More info)

In bug 1187864 comment 3 we found out that some pings report too many thread hang stats.
Since the repeated "(chrome script)" stack entries are not really useful we should collapse those into one "(chrome script)" or "(chrome script <N>)".

So, per bug 1187864 there are different things we can do here:
(1) limit the number of hang stat entries
(2) limit the stack depth to a sane number
(3) collapse repeated successive "(chrome script)" to one "(chrome script)" or "(chrome script <N>)"
For (3) i think we should just stay with collapsing into the simple "(chrome script)" (unless there are compelling reasons to make it more complicated).
I'd be fine with moving this part to a follow-up if it is no big win.

Analysing the data from bug 1191846, we found out that fixing the stuff from point (2) and (3) in comment 2, along with the future fix from bug 1213780, is enough to bring down the size of the majority of the big pings to less than 1Mb.
This means that we're not limiting the number of thread hang stat entries, for the moment.
In order to find a reasonable maximum stack depth for (2), we've used a python notebook [1] to check the 95th percentile of BHR stack depths received by the Telemetry servers for the Nightly population. This value is 11, so that's the maximum depth our reported stacks will have.
Thanks Vladan and Georg for their feedback and suggestions.
[1] - https://gist.github.com/Dexterp37/fc4a043fb442d4e4be90

Created attachment 8673517[details][diff][review]bug1211411.patch
This patch changes the BackgroundHangMonitor to clean up the reported hang stacks by collapsing repeated "(chrome script)"/"(content script)" entries.
It also limits the stack depth by removing the topmost frames if greater than the maximum allowed depth.

Created attachment 8674405[details][diff][review]bug1211411.patch
Thanks for the review. This patch addresses your first comment and, as discussed over IRC, I've kept the second change and added a little comment there.
Moreover, this patch adds the "(reduced stack)" entry to mark that the stack was reduced.

Alessio: how do you feel about uplifting this patch and the patch in bug 1219751 to Beta 43? It's already on Aurora 44 afaict.
If you're ok with uplifting these, I'd like to get them in by the end of the week, in time for next week's A/B experiment

Comment on attachment 8674878[details][diff][review]bug1211411.patch
Approval Request Comment
[Feature/regressing bug #]: Limit the depth of thread hang stacks, consistently reducing the size of the pings sent to Telemetry servers.
[User impact if declined]: This isn't a user-facing feature. However, without this patches bug 1222894 could potentially cause sending a lot of oversized pings, wasting both storage and processing time server side, thus increasing costs.
[Describe test coverage new/current, TreeHerder]: This has been on m-c for 3 weeks without any reported issue.
[Risks and why]: Low risk. This only touches Telemetry & Thread Hang stacks, no front facing features or other systems.
[String/UUID change made/needed]: None.
Please note that this is part of an uplift stack required for bug 1222894. This is the stack (the patches correctly apply in order to mozilla-beta): 1213780, 1211411, 1215540, 1219751.