(It’s worth pointing out that the coloring for the JS-related tests is wrong; I think those tests are “bigger is better” tests.)

One meta-point before diving into suggestions for various tests: emails don’t always get triggered for the correct changesets. To see what I mean, take a look at the following examples:

FF17 Ts, MED Dirty Profile: We were quite fortunate during the FF17 cycle; our Talos emails identified a single changeset as causing significant regressions in several areas, notably startup. But if you look at the above linked chart, you’ll see that on x86-64 Linux, the regression is linked to changesets occuring after the regressing changeset. I’m not sure how this happens, but it’s clearly a problem for identifying regressions.

FF18 DHTML Row Major MozAfterPaint: XP shows an improvement that’s almost certainly related to DLBI landing, except that the improvement is attributed to the changesets before the landing, which is bizarre.

All this suggests that there’s a bug in how we’re benchmarking our trees and generating our results; I haven’t investigated any further.

Comments closed—Trackbacks closedRSS 2.0 feed for these commentsThis entry (permalink) was posted on Friday, October 5, 2012, at 9:21 pm by Nathan Froyd. Filed in Uncategorized.

5 Comments

mccr8

The constructors one is particularly weird given that it is a quantity that can be computed precisely, compared to performance which can have some variance.

Yeah, I do not understand how the “previous” score for constructors is calculated. Would have to go and look at the code. It certainly doesn’t encourage people to look at it when you tell them their changeset added .68 of a constructor.

So I’ll ask questions and make a bunch of requests to try to slow you down:

1. Is the changeset range a rollup, or the finest available granularity? (As in, does each range correspond to a interval between talos runs?)

2. I’d like to be able to look at the table to find an interesting platform, then click on the platform heading to show a graph of just that platform over time. Maybe that points to the graph server with appropriate parameters? (I haven’t looked at the graph server in ages, ever since I gave up on interpreting whatever the heck its mass of lines was trying to show me.)

3. I want to be able to feed in a changeset hash and have it highlight the row containing it on all graphs. (“Did changeset X break/improve things?”)

4. I’d like push datetime ranges for the rev ranges.

5. DD/MM/YYYY sucks almost as much as MM/DD/YYYY. Pretty please use YYYY/MM/DD?

6. I sort of want to be able to star the big jumps (comment on them with any known explanation.) But that raises all kinds of issues

7. Is the final row a percent? If so, add a percent sign so I don’t have to wonder.

8. White gaps mean talos was not run for anything in that changeset range? Then what does a multi-row cell just after a white gap mean?

9. Could you add a changeset count to the rows? Just to have a feel for whether it was a merge or individual change. I suppose it would be more direct to label merges vs regular commits (vs backouts?)

Glad to see the visualizer is getting some comments! I should write a post explaining things in a bit more detail; the graphserver ought to show all this stuff too…

1: Each colored block in a table corresponds to an email sent to dev-tree-management. Whitespace is assumed to mean that there was no significant change for that platform over whatever range of changesets gets covered, given the lack of emails to dev-tree-management.

2: Correlating this with the graphserver would be useful. Even just letting the individual changes link back to graphserver output.