Goal

The Chromium Perf waterfall runs many different tests against the latest Chrome to measure Chrome’s performance. This data is automatically analyzed by the perf dashboard and will report anomalies on the alerts page. These anomalies likely correspond to a performance regression.

What to Do

Keep bots green

Label all waterfall bugs Performance-Waterfall.

Purple bot? Purple indicates infrastructure problems. Purple on the waterfall/console page is almost always bad. The exception is that slaves temporarily go purple on the /builders or /buildslaves page during reboots between runs. If a bot it purple on the waterfall/console page or purple for more than 10 minutes on the /builders or /buildslaves page, that’s a problem.File a bug with this template. Make sure to fill in the OS label, and file separate bugs for each OS as they may be handled by different infra sub-teams.If the failure is in the checkout or upload steps, check the step logs and note if it’s one of the following in the bug comments:- Low disk space in checkout step- Corrupted checkout- Permissions issues in checkout or update stepGet attention to the bug.Ping chrome-troopers@google.com when you file the bug. If it’s urgent, locate the on-call trooper (how to find the trooper). If the oncall trooper is unresponsive and the issue is truly urgent, escalate via chat/email to abw/jparent.Ping the bug daily if there is no activity.

Red bot? Red indicates a failure. Usually this is a telemetry test failure, but sometimes infra failures turn bots red.Is it an infra failure? Some signs it might be an infra failure:It occurs in a non-test step, like bot_updateLogs unrelated to tests runningIf it’s an infra failure, file two bugs.1. File a bug as above for a purple bot, and ping the same way.2. File a bug on the fact that it’s not purple.If it’s a test failure, file a bug with this template.Cc the test owner.In the bug comments:- Make sure it contains a link to the failing step build log.- Copy and paste relevant info from the log (i.e. stack trace)- Include the revision range where the first failure was seen.And investigate the bug.Scan the build history for the first failure. Click it and analyze the CLs in that build for what could have caused the failure. If the failure happens on multiple bots, cross-reference the revision ranges to narrow it down further. If you are able to identify the culprit change with reasonable certainty, add the author to the bug and in the meantime politely revert the CL. No need to wait for the author to confirm in the case of a breakage.If you are not able to identify the culprit, use the tips in Telemetry: Diagnosing Test Failures to diagnose.

Stopping pointless builds.

If you know that a build will fail and you have a CL waiting that will fix it, it is possible to stop builds mid-way by connecting to the bot via:

By the end of your shift, try to leave the next sheriff with a green tree!

Triage dashboard alerts

At the start of your shift, it's a good idea to look over what bugs were reported recently.

If you're not signed in, click "sign in" in the top right corner. Use your google.com account, since internal-only data will only be visible then.

Visit thealerts dashboard page and make sure that the "Chromium Perf Sheriff" rotation is selected; now you can start to triage groups of related alerts.

You can sort the alerts table by clicking on the table column headers.

Check one or more related alerts, and clicking the "Graph" button.

Examine the existing graphs to see whether they appear to be caused by the same root regression, and close out the graphs that are not part of the same issue.

Look in the list of alerts to see whether there are any for which a bug has already been filed; if so, associate the new alerts with the existing bug. Otherwise, file a new bug.

Remember, the perf dashboard is not as smart as you -- in some cases you may need to mark alerts as invalid, or nudge them. Watch out for the following patterns:

If the alert looks like noise, mark the alert "Invalid" and don't file a bug. If it looks like the noise level sharply increased recently, you may file a bug to track the noise increase. If you see many invalid alerts on the same graph, you may file a bug to have the noisy test disabled, and/or update the anomaly threshold settings to be less sensitive.

If the reference trace moved by the same amount as the target trace, this means something likely changed in the test or the bot, not in Chrome. Mark the alert "Invalid".

If the alert is not placed on the first bad data point, use the "Nudge" option the alert into that position. In the example below, the alert is incorrectly placed on the last good data point. It requires a Nudge +1 to the right. Remember, the Bisect button and bug title will only work if you nudge into the right spot first.

If the alert is a revert of a very recent improvement, choose the "Ignore Alert" option. You have bigger fish to fry.

If the regression has already recovered at tip of tree, choose the "Ignore Alert" option. No sense in spending time on it.

After going through the above, you will be left with alerts that may be for real regressions.

It is worth spending considerable effort at this point to group alerts that appear to have the same underlying cause. Look for similar sets of tests that regressed at the same revision range and associate them to the same bug.

If the alert is a real regression,file a bug and track down the right owner (see detailed instructions under "Diagnose regressions" below).

By the end of your shift, try to leave the next sheriff with an empty alerts page!

Diagnose regressions

As perf sheriff, you are responsible for following through with the regression bugs filed during your shift. Assign yourself as the owner until you find a more appropriate owner. Be sure the bugs have the labels "Performance" and "Type-Bug-Regression" so they show up in our weekly triage meetings. To further prevent severe regressions from slipping through the cracks, consider applying milestone and release block labels to the bug.

Use the "Bisect" button on the dashboard to kick off a bisect job to pinpoint the culprit change. The dashboard will run the bisect and update the bug with the results when complete.

When the bug gets the results, verify that they make sense. The magnitude of the regression detected by the bisect should roughly match that on the dashboard and the confidence should be high. If not, change something about the bisect and try again. For example: more iterations, a wider revision range, a different platform, or a more specific test (e.g. one page of the page_cycler or one suite of dromaeo).

Once you find the culprit change, assign the bug to the author and ask them to revert. Chrome has a no-regression policy as specified in ourcore principle of speed. Because there are sometimes tradeoffs involved, or other considerations, it usually is best to let the author do this rather than doing it yourself. If the author claims the regression is expected but you have any doubts, feel free to loop in rschoen/tonyg or reach out to perf-sheriffs at chromium dot org for more support.

Find the line corresponding to the new failure. For example, if you're looking at a failure on the perf dashboard in linux-release-64, the sizes test, and a failing expectation for 'chrome/chrome', the line you need to update will have the key 'linux-release-64/sizes/chrome/chrome'

Select a new reva and revb. This should be a range of commits (x-axis values) that exhibit the new value of the metric; regardless of whether the metric rose or fell, you want to include two points on the new plateau. It is good to include at least 50 revisions in your range; this allows make_expectations.py to get a sense of how noisy the metric is.

For that test result, update reva and revb to match the new range and save your changes to the file. When make_expectations.py computes values for regress and improve, it will use a tolerance (5%, by default) around the actual dashboard y-axis values over the interval you've specified. A handful of metrics use a non-default tolerance. Typically there is no need to touch any of the other fields; make_expectations.py will do that for you.

Run 'make_expectations.py', this updates perf_expectations.json with new results values and a new checksum for that line.

Open a new bug (example) with label Performance and cc all authors of CLs in the regression revision range. Assign the bug to the first or most likely author in the list and ask them all to verify if their change could have introduced the regression.

Upload a CL (example) with your change. Like the example, include in the commit message the key that is being updated along with a working persistent link to the change on the graph. Make sure the presubmit tests run, since these tests check for common syntax errors. Send your CL for review to the current perf sheriffs (check the calendar if you're unsure). Include at the bottom of the description:

BUG=<your new bug>

Publish your CL on codereview.chromium.org so that it can be reviewed.

If the tree is open, land your CL then notify the sheriff(s).

If the tree is closed, check with the sheriffs if you are free to land your change. Let them review it if they want, in addition to the TBR.

Join Us

If you'd like to help keep the Chromium Perf waterfall green, send email to rschoen@chromium.org. The rotation lasts 3 days, once per quarter.