Week of Jan 02 -> 09, 2017

Turning on leak checking (bug 1325148) – note, we did this Dec 29th and whitelisted a lot, still much exists and many great fixes have taken place

some infrastructure issues, other timeouts, and general failures

I am excited for the coming weeks as we reduce the orange factor back down <7 and get the high frequency bugs <20.

Outside of these tracking stats there are a few active projects we are working on:

adding BUG_COMPONENTS to all files in m-c (bug 1328351) – this will allow us to then match up triage contacts for each components so test case ownership has a patch to a live person

retrigger an existing job with additional debugging arguments (bug 1322433) – easier to get debug information, possibly extend to special runs like ‘rr-chaos’

add |mach test-info| support (bug 1324470) – allows us to get historical timing/run/pass data for a given test file

add a test-lint job to linux64/mochitest (bug 1323044) – ensure a test runs reliably by itself and in –repeat mode

While these seem small, we are currently actively triaging all bugs that are high frequency (>=30 times/week). In January triage means letting people know this is high frequency and trying to add more data to the bugs.

Now that we have a better process for taking action on Talos alerts and pushing them to resolution, it is time to take a step back and see if any trends show up in our bugs.

First I want to look at bugs filed/week:

This is fun to see, now what if we stack this up side by side with the alerts we receive:

We started tracking alerts halfway through this process. We show that for about 1 out of every 25 alerts we file a bug. I had previously stated it was closer to 1/33 alerts (it appears that is averaging out the first few weeks).

Lets see where these bugs are filed, here is a view of the different bugzilla products:

The Testing product is used to file bugs that we cannot figure out the exact changeset, so they get filed in testing::talos. As there are almost 30 unique components bugs are filed in, I took a few minutes to look at the Core product, here is where the bugs live in Core:

Pardon my bad graphing attempt here with the components cut off. Graphics is the clear winner for regressions (with “graphics: layers” being a large part of it). Of course the Javascript Engine and DOM would be there (a lot of our tests are sensitive to changes here). This really shows where our test coverage is more than where bad code lives.

Now that I know where the bugs are, here is a view of how long the bugs stay open:

The fantastic news is most of our bugs are resolved in <=15 days! I think this is a metric we can track and get better at- ideally closing all Talos regression bugs in <30 days.

Looking over all the bugs we have, what is the status of them?

Yay for the blue pacman! We have a lot of new bugs instead of assigned bugs, that might be something we could adjust and assign owners once it is confirmed and briefly discussed- that is still up in the air.

The burning question is what are all the bugs resolved as?

To me this seems healthy, it is a starting point. Tracking this over time will probably be a useful metric!

In summary, many developers have done great work to make improvements and fix patches over the last 6 months that we have been tracking this information. There are things we can do better, I want to know-