Bootstrapping Time Series

We have a “compare” feature in our product at work. Here’s what that looks like:

That picture shows the change in total query execution time for two time ranges.
The percent change is calculated against the sums of points in the “Time” (execution time) time
series.

You’ll notice that a couple of those queries have huge change percents. Over 600%! There are also a
few spikes in the time series. Those outliers can really mess things up. Imagine what happens when
a database server stalls up: nothing gets done for a short period of time and every query’s
execution time goes up. If you’re looking for changes during a period that had a stall, you’re out
of luck. All of your percents will probably be really high because of that stall.

What would make this feature a lot better is some notion of statistical significance. If there’s a
large change, you should be able to know if it’s because of an actual change and not an unlucky
combination of points. This is what hypothesis tests are used for in statistics.

Usually with hypothesis tests, you have a bunch of samples that come from a single population that
you use to create a model for the population. Then, you take a brand new sample that you know
almost nothing about and try to see if it fits within the model you created before. Unfortunately,
we can’t take the same approach in this case because we only have one prior sample.

That’s where bootstrapping comes in.

Bootstrap

Bootstrapping is a technique
that lets you generate new samples from an existing sample. It relies on sampling with
replacement. Bootstrapping is not a new technique. Bradley Efron proposed it in the late 1970s. I
heard it didn’t gain much use because of its high computational requirements. Now that computers are
powerful enough, bootstrapping is getting more popular.

Suppose you had a 5-element array and its sum:

[1, 2, 3, 4, 5] => 15

Sampling with replacement can give you results like this. Notice how some of the sums are smaller
than the original 15, and some are larger.

…you get a nice distribution! You can do a lot of cool things with that distribution, like
calculate confidence intervals. This is what you can use for hypothesis tests.

Examples

These examples all use sums of time series. Points of series 1 are sampled with replacement to
generate a distribution of sums. The sum of series 2 is compared against that distribution to check
for significance.

In this first example, both time series are based on Math.random() with an average around 3, but
series 1 has a value of 80 at index 25.

Series 2’s sum is 47% lower, but this change is not significant. Clearly our spike had a big impact
on the series 1 sum.

This next example is real data. These time series represent error counts for a single query.
The percent change tells us that the number of errors in the second time range went down by almost
a half. I wouldn’t put too much faith into that number, though. Bootstrapping tells us this change
isn’t significant.

I’ve only covered sums in this post, but this technique can be applied with any kind of aggregation.
You can create distributions of averages, quantiles, minimums, maximums… anything that can take
advantage of sampling with replacement. I think bootstrapping has lots of potential, and it’s really
easy to implement!

EDIT: The takeaway in the original post may not have been clear. This approach is not a new way
of detecting changes without outliers or anything like that. I don’t even think there’s anything
wrong with the percent change itself. The main takeaway is that a single number like the percent
change can be affected by outliers and typical variability of systems. A number like that is hard to
interpret by itself. By bootstrapping, you can create some sort of confidence or significance
indicator that can help you interpret results. For example, an average is much more useful if you
also had the standard deviation because the standard deviation gives you an idea of what the spread
of the data is like. The time series example I showed in this post uses bootstrapping to give you a
better idea about whether or not there is an actual, significant change in a time series.