Monthly Archives: June 2014

This is the first post is a series where I will post some ideas. These are ideas, not active projects (although these ideas could be implemented with many active projects).

My first idea is surrounding the concept of AutoLand. Mozilla has talked about this for a long time. In fact a conversation I had last week got me thinking more of the value of AutoLand vs blocking on various aspects of it. There are a handful of issues blocking us from a system where we push to try and if it goes well we magically land it on the tip of a tree. My vested interest comes in the part of “if it goes well”.

The argument here has been that we have so many intermittent oranges and until we fix those we cannot determine if a job is green. A joke for many years has been that it would be easier to win the lottery than to get an all green run on tbpl. I have seen a lot of cases where people push to Try and land on Inbound to only be backed out by a test failure- a test failure that was seen on Try (for the record I personally have done this once). I am sure someone could write a book on human behavior, tooling, and analysis of why failures land on integration branches when we have try server.

My current thought is this-

* push to try server with a final patch, run a full set of tests and builds

* when all the jobs are done [1], we analyze the results of the jobs and look for 2 patterns

* pattern 1: for a given build, at most 1 job fails

* pattern 2: for a given job [2], at most 1 platform fails

* if pattern 1 + 2 pass, we put this patch in the queue for landing by the robots

[1] – we can determine the minimal amount of jobs or verify with more analysis (i.e. 1 mochitest can fail, 1 reftest can fail, 1 other can fail)

[2] – some jobs are run in different chunks. on opt ‘dt’ runs all browser-chrome/devtools jobs, but this is ‘dt1’, ‘dt2’, ‘dt3’ on debug builds

This simple approach would give us the confidence that we need to reliably land patches on integration branches and achieve the same if not better results than humans.

For the bonus we could optimize our machine usage by not building/running all jobs on the integration commit because we have a complete set done on try server.

Now that we have an uplift completed and enough future data has been collected to ensure sustained changes in data automatically, it is time for the triple fortnightly report of what performance looks like. For reference there is some data in a blog post about general talos numbers.

As you can see Firefox32 has a lot of improvements and fewer regressions (of those 20 about half are related to rebasing numbers).

Lets look at bugs:

36 bugs filed to date for Firefox32 Talos regressions

16 are resolved (7 as wontfix)

20 are open (this means that 17 of them are only showing up on non-pgo)

After reviewing the process of investigating alerts, it makes sense that we continue forward with the same process in 6 week intervals and any changes are made on uplift day and they would apply only to trunk. Some future changes we are considering:

not filing bugs on minimal regressions (ex. <4%)

not filing bugs on non-pgo only regressions (since we only build pgo on Aurora, Beta, Release)

generating alerts for per test (not per suite) regressions (and only file bugs if a single test is >10%)

adjust the graph server alert calculation to not drop the page with the highest value and to report the geometric mean of the pages instead of the average

any other great ideas you have on how to be efficient with our time while continuing to identify and document our regressions