NumXL Support Desk

Which autocorrelation (ACF) estimation method should I use?

Before we can answer this question, let’s take a quick overview of the autocorrelation definition. In principle, the autocorrelation of a time series ${x_{t}}$ for lag k is a cross-correlation of the time series with its k-lagged version (i.e. ${x_{t-k}}$) of itself.

$$\rho_k = \frac{\gamma_k }{\gamma_o }$$

Where

$\rho_k$ is the population autocorrelation for lag k.

$\gamma_k$ is the population autocovariance for lag k.

$\gamma_o $ is the population variance.

Using a finite-length time series sample, an estimate of autocorrelation ($\hat{\rho_k}$) can be obtained as follow.

Although this method yields a biased estimator for the autocorrelation, and, to make things worse, the values calculated (as a function of k) don’t form a valid autocorrelation function, in a sense, we can’t define a theoretical process having exactly those values.

This method is implemented in NumXL ACF function as “sample autocorrelation method (default)”

Why do we care for this method?

The “sample autocorrelation” method is found in many academic textbooks and implemented in many popular software packages. NumXL includes this method for benchmarking and for completion purposes.

Method 2: Periodogram-based (Spectral Density) Estimate

There is a strong relationship between the time series periodogram (spectral analysis) and its autocovariance function.

Although the periodogram-based method computes a biased estimate of the autocorrelation, the error is generally smaller than one from other methods (e.g. Method 1).

This method suffers from same issues: biased estimate and calculated values (as a function of k) don’t always form a valid autocorrelation function.

This method is implemented in NumXL ACF function as “periodogram-based estimate.

Method 3: Cross-correlation

We treat the original time series and its k-lagged version as two separate time series and calculate the Pearson cross-correlation value.

Consider a finite stationary time series of length N observations ${x_{t}}$

This autocorrelation values computed using this method (as a function of k) form a valid autocorrelation function, in a sense that it is possible to define a theoretical process having exactly that autocorrelation. This is not the case with Method 1 and Method 2.

Which method to use?

It depends on your objective.

To compare our values with ones from a 3rd Party package, you should use Method 1 (sample autocorrelation method).

Otherwise, we strongly recommend using the cross-correlation method, as its values are consistent and form a valid autocorrelation function.

Note that, for large sample data, the difference between the values calculated by the different methods are very small.