Breadcrumb

3.2 Diagnostics

The Ljung-Box statistic, also called the modified Box-Pierce statistic, is a function of the accumulated sample autocorrelations, rj, up to any specified time lag \(m\). As a function of \(m\), it is determined as:

\(Q(m) = n(n+2)\sum_{j=1}^{m}\frac{r^2_j}{n-j},\)

where n = number of usable data points after any differencing operations. (Please visit forvo for the proper pronunciation of Ljung.)

Use of the Statistic

This statistic can be used to examine residuals from a time series model in order to see if all underlying population autocorrelations for the errors may be 0 (up to a specified point).

For nearly all models that we consider in this course, the residuals are assumed to be “white noise,” meaning that they are identically, independently distributed (from each other). Thus, as we saw last week, the ideal ACF for residuals is that all autocorrelations are 0. This means that \(Q(m)\) should be 0 for any lag \(m\). A significant \(Q(m)\) for residuals indicates a possible problem with the model.

Distribution of \(Q(m)\)

There are two cases:

When the \(r_j\) are sample autocorrelations for residuals from a time series model, the null hypothesis distribution of \(Q(m)\)is approximately a \(\chi^2\) distribution with df = \(m-p\), where \(p\) = number of coefficients in the model.

Note!
\(m\) = lag to which we’re accumulating, so in essence, the statistic is not defined until \(m>p\).

When no model has been used, so that the ACF is for raw data, \(p\) = 0 and the null distribution of \(Q(m)\) is approximately a \(\chi^2\) distribution with df = \(m\).

p-Value Determination

In both cases, a p-value is calculated as the probability past \(Q(m)\) in the relevant distribution. A small p-value (for instance, p-value < .05) indicates the possibility of non-zero autocorrelation within the first \(m\) lags.

Below there is Minitab output for the Lake Erie level data that was used for homework 1 and in Lesson 3.1. A useful model is an AR(1) with a constant. So, p = 2.

Final Estimates of Parameters

Type

Coef

SE Coef

T

P

AR 1

0.7078

0.1161

6.10

0.000

Constant

4.2761

0.1953

21.89

0.000

Mean

14.6349

0.6684

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

Lag

12

24

36

48

Chi-Square

9.4

23.2

30.0

*

DF

10

22

34

*

P-Value

0.493

0.390

0.662

*

Minitab gives p-values for accumulated lags that are multiples of 12. The R sarima command will give a graph that shows p-values of the Ljung-Box-Pierce tests for each lag (in steps of 1) up to a certain lag, usually up to lag 20 for nonseasonal models.

Interpretation of the Box-Pierce Results

Notice that the p-values for the modified Box-Pierce all are well above .05, indicating “non-significance.” This is a desirable result. Remember that there only 40 data values, so there’s not much data contributing to correlations at high lags. Thus, the results for \(m\) = 24 and \(m\) = 36 may not be meaningful.

When you request a graph of the ACF values, "significance" limits are shown by R and by Minitab. In general, the limits for the autocorrelation are placed at \(0 ± 2\) standard errors of \(r_k\). The formula used for standard error depends upon the situation.

Within the ACF of residuals as part of the ARIMA routine, the standard errors are determined assuming the residuals are white noise. The approximate formula for any lag is that s.e. of \(r_k=1/\sqrt{n}\).

For the ACF of raw data (the ACF command), the standard error at a lag k is found as if the right model was an MA(k-1). This allows the possible interpretation that if all autocorrelations past a certain lag are within the limits, the model might be an MA of order defined by the last significant autocorrelation.

What are standardized residuals in a time series framework? One of the things that we need to look at when we look at the diagnostics from a regression fit is a graph of the standardized residuals. Let's review what this is for regular regression where the standard deviation is \(\sigma\). The standardized residual at observation i

\(\dfrac{y_i - \beta_0 - \sum_{j=1}^{p}\beta_j x_{ij}}{\sigma},\)

should be N(0, 1). We hope to see normality when we look at the diagnostic plots. Another way to think about this is: