Bisecting Performance Regressions

Summary

A Python script for automating the process of syncing, building, and testing commits for performance regressions.

The script works by syncing to both revisions specified on the command line, building them, and running performance tests to get baseline values for the bisection. After that, it will perform a binary search on the range of commits between the known good and bad revisions. If the pinpointed revision turns out to be a change in Blink, V8, or Skia, the script will attempt to gather a range of revisions for that repository and continue bisecting.

The try bot will send an email on completion as a regular "Try Success" email, showing whether bisect was successful and linking to output (the Results stdio link at the bottom).

Also note that the trybots run on LKGR, not ToT. If you just a made change to the bisect script themselves, make sure you pass -r REV to ensure the bisect script contains your revision.

If the bot seems to be down, you can try pining a trooper.

Run Locally

You probably don't want to run locally, except to debug proper settings before sending to a try bot, or if running overnight, since the test will run in your local session and make your computer impossible to use. Further, you won't be able to run anything without interfering with the tests.

Recommended that you set power management features to "performance" mode.

For googlers:

sudo goobuntu-config set powermanagement performance

Run locally in private checkout (recommended way of running locally); first change to CHROMIUM_DIR="$CHROMIUM_ROOT/src"

Tips

Often you can get a clearer regression by looking at the other metrics in the same test. For example, if the metric is warm_times/page_load_time for a page_cycler regression, look at the individual pages. Often there's a page where the regression clearly stands out that you can bisect on.

With tests that suddenly become noisy, bisecting on changes in the mean isn't all that useful. There's a "bisect_mode" parameter in the config that allows you to specify "std_dev", and the bisect script will bisect on changes in standard deviation instead. There is currently no way to do this from the dashboard, so you'll have to initiate the bisect manually. The list of available modes can be found in bisect_utils.py: 'mean', 'std_dev', and 'return_code' are the values as of January 2015. 'error_code' can be used to find the point where perf tests begin failing (when metrics aren't being produced).

You can use the bisect script to find functional breakages as well. Specify "return_code" for the "bisect_mode". You can leave "metric" empty since this won't be used. There is currently no way to do this from the dashboard, so you'll have to initiate the bisect manually.

Gotchas

If you suspect a Blink/V8/Skia roll, be sure that the range you specify includes when the DEPS file was submitted; the script will attempt to detect this situation and expand the range to include the DEPS roll.