serialCorrelationTest: Test for the Presence of Serial Correlation

Description

serialCorrelationTest is a generic function used to test for the
presence of lag-one serial correlation using either the rank
von Neumann ratio test, the normal approximation based on the Yule-Walker
estimate of lag-one correlation, or the normal approximation based on the
MLE of lag-one correlation. The function invokes particular
methods which depend on the class of the first
argument.

Currently, there is a default method and a method for objects of class "lm".

Arguments

x

numeric vector of observations, a numeric univariate time series of
class "ts", or an object of class "lm". Undefined (NaN) and
infinite (Inf, -Inf) values are not allowed for x
when x is a numeric vector or time series, nor for the residuals
associated with x when x is an object of class "lm".

When test="AR1.mle", missing (NA) values are allowed, otherwise
they are not allowed. When x is a numeric vector of observations
or a numeric univariate time series of class "ts", it must contain at least
3 non-missing values. When x is an object of class "lm", the
residuals must contain at least 3 non-missing values.

Note: when x is an object of class "lm", the linear model
should have been fit using the argument na.action=na.exclude in the
call to lm in order to correctly deal with missing values.

test

character string indicating which test to use. The possible values are: "rank.von.Neumann" (rank von Neumann ratio test; the default), "AR1.yw" (z-test based on Yule-Walker lag-one estimate of correlation), and "AR1.mle" (z-test based on MLE of lag-one correlation).

alternative

character string indicating the kind of alternative hypothesis. The possible
values are "two.sided" (the default), "greater", and "less".

conf.level

numeric scalar between 0 and 1 indicating the confidence level associated with
the confidence interval for the population lag-one autocorrelation. The default
value is conf.level=0.95.

...

optional arguments for possible future methods. Currently not used.

Details

Let \underline{x} = x_1, x_2, …, x_n denote n observations from a
stationary time series sampled at equispaced points in time with normal (Gaussian)
errors. The function serialCorrelationTest tests the null hypothesis:

H_0: ρ_1 = 0 \;\;\;\;\;\; (1)

where ρ_1 denotes the true lag-1 autocorrelation (also called the lag-1
serial correlation coefficient). Actually, the null hypothesis is that the
lag-k autocorrelation is 0 for all values of k greater than 0 (i.e.,
the time series is purely random).

In the case when the argument x is a linear model, the function
serialCorrelationTest tests the null hypothesis (1) for the
residuals.

The three possible alternative hypotheses are the upper one-sided alternative
(alternative="greater"):

H_a: ρ_1 > 0 \;\;\;\;\;\; (2)

the lower one-sided alternative (alternative="less"):

H_a: ρ_1 < 0 \;\;\;\;\;\; (3)

and the two-sided alternative:

H_a: ρ_1 \ne 0 \;\;\;\;\;\; (4)

Testing the Null Hypothesis of No Lag-1 Autocorrelation
There are several possible methods for testing the null hypothesis (1) versus any
of the three alternatives (2)-(4). The function serialCorrelationTest allows
you to use one of three possible tests:

The rank von Neuman ratio test.

The test based on the normal approximation for the distribution of the
Yule-Walker estimate of lag-one correlation.

The test based on the normal approximation for the distribution of the
maximum likelihood estimate (MLE) of lag-one correlation.

Each of these tests is described below.

Test Based on Yule-Walker Estimate (test="AR1.yw")
The Yule-Walker estimate of the lag-1 autocorrelation is given by:

\hat{ρ}_1 = \frac{\hat{γ}_1}{\hat{γ}_0} \;\;\;\;\;\; (5)

where

is the estimate of the lag-k autocovariance.
(This estimator does not allow for missing values.)

Under the null hypothesis (1), the estimator of lag-1 correlation in Equation (5) is
approximately distributed as a normal (Gaussian) random variable with mean 0 and
variance given by:

Var(\hat{ρ}_1) \approx \frac{1}{n} \;\;\;\;\;\; (7)

(Box and Jenkins, 1976, pp.34-35). Thus, the null hypothesis (1) can be tested
with the statistic

z = √{n} \hat{ρ_1} \;\;\;\;\;\; (8)

which is distributed approximately as a standard normal random variable under the
null hypothesis that the lag-1 autocorrelation is 0.

Test Based on the MLE (test="AR1.mle")
The function serialCorrelationTest the R function arima to
compute the MLE of the lag-one autocorrelation and the estimated variance of this
estimator. As for the test based on the Yule-Walker estimate, the z-statistic is
computed as the estimated lag-one autocorrelation divided by the square root of the
estimated variance.

Test Based on Rank von Neumann Ratio (test="rank.von.Neumann")
The null distribution of the serial correlation coefficient may be badly affected
by departures from normality in the underlying process (Cox, 1966; Bartels, 1977).
It is therefore a good idea to consider using a nonparametric test for randomness if
the normality of the underlying process is in doubt (Bartels, 1982).

Wald and Wolfowitz (1943) introduced the rank serial correlation coefficient, which
for lag-1 autocorrelation is simply the Yule-Walker estimate (Equation (5) above)
with the actual observations replaced with their ranks.

von Neumann et al. (1941) introduced a test for randomness in the context of
testing for trend in the mean of a process. Their statistic is given by:

which is the ratio of the square of successive differences to the usual sums of
squared deviations from the mean. This statistic is bounded between 0 and 4, and
for a purely random process is symmetric about 2. Small values of this statistic
indicate possible positive autocorrelation, and large values of this statistics
indicate possible negative autocorrelation. Durbin and Watson (1950, 1951, 1971)
proposed using this statistic in the context of checking the independence of
residuals from a linear regression model and provided tables for the distribution
of this statistic. This statistic is therefore often called the
“Durbin-Watson statistic” (Draper and Smith, 1998, p.181).

The rank version of the von Neumann ratio statistic is given by:

where R_i denotes the rank of the i'th observation (Bartels, 1982).
(This test statistic does not allow for missing values.) In the absence of ties,
the denominator of this test statistic is equal to

∑_{i=1}^n (R_i - \bar{R})^2 = \frac{n(n^2 - 1)}{12} \;\;\;\;\;\; (11)

The range of the V_{rank} test statistic is given by:

[\frac{12}{(n)(n+1)} , 4 - \frac{12}{(n)(n+1)}] \;\;\;\;\;\; (12)

if n is even, with a negligible adjustment if n is odd (Bartels, 1982), so
asymptotically the range is from 0 to 4, just as for the V test statistic in
Equation (9) above.

Bartels (1982) shows that asymptotically, the rank von Neumann ratio statistic is a
linear transformation of the rank serial correlation coefficient, so any asymptotic
results apply to both statistics.

For any fixed sample size n, the exact distribution of the V_{rank}
statistic in Equation (10) above can be computed by simply computing the value of
V_{rank} for all possible permutations of the serial order of the ranks.
Based on this exact distribution, Bartels (1982) presents a table of critical
values for the numerator of the RVN statistic for sample sizes between 4 and 10.

Determining the exact distribution of V_{rank} becomes impractical as the
sample size increases. For values of n between 10 and 100, Bartels (1982)
approximated the distribution of V_{rank} by a
beta distribution over the range 0 to 4 with shape parameters
shape1=ν and shape2=ω and:

Bartels (1982) checked this approximation by simulating the distribution of
V_{rank} for n=25 and n=50 and comparing the empirical quantiles
at 0.005, 0.01, 0.025, 0.05, and 0.1 with the
approximated quantiles based on the beta distribution. He found that the quantiles
agreed to 2 decimal places for eight of the 10 values, and differed by 0.01
for the other two values.

Note: The definition of the beta distribution assumes the
random variable ranges from 0 to 1. This definition can be generalized as follows.
Suppose the random variable Y has a beta distribution over the range
a ≤ y ≤ b, with shape parameters ν and ω. Then the
random variable X defined as:

X = \frac{Y-a}{b-a} \;\;\;\;\;\; (14)

has the “standard beta distribution” as described in the help file for Beta
(Johnson et al., 1995, p.210).

Bartels (1982) shows that asymptotically, V_{rank} has normal distribution
with mean 2 and variance 4/n, but notes that a slightly better approximation
is given by using a variance of 20/(5n + 7).

To test the null hypothesis (1) when test="rank.von.Neumann", the function serialCorrelationTest does the following:

When the sample size is between 3 and 10, the exact distribution of V_{rank}
is used to compute the p-value.

When the sample size is between 11 and 100, the beta approximation to the
distribution of V_{rank} is used to compute the p-value.

When the sample size is larger than 100, the normal approximation to the
distribution of V_{rank} is used to compute the p-value.
(This uses the variance 20/(5n + 7).)

When ties are present in the observations and midranks are used for the tied
observations, the distribution of the V_{rank} statistic based on the
assumption of no ties is not applicable. If the number of ties is small, however,
they may not grossly affect the assumed p-value.

When ties are present, the function serialCorrelationTest issues a warning.
When the sample size is between 3 and 10, the p-value is computed based on
rounding up the computed value of V_{rank} to the nearest possible value
that could be observed in the case of no ties.

Computing a Confidence Interval for the Lag-1 Autocorrelation
The function serialCorrelationTest computes an approximate
100(1-α)\% confidence interval for the lag-1 autocorrelation as follows:

where \hat{σ}_{\hat{ρ}_1} denotes the estimated standard deviation of
the estimated of lag-1 autocorrelation and z_p denotes the p'th quantile
of the standard normal distribution.

When test="AR1.yw" or test="rank.von.Neumann", the Yule-Walker
estimate of lag-1 autocorrelation is used and the variance of the estimated
lag-1 autocorrelation is approximately:

Var(\hat{ρ}_1) \approx \frac{1}{n} (1 - ρ_1^2) \;\;\;\;\;\; (16)

(Box and Jenkins, 1976, p.34), so

\hat{σ}_{\hat{ρ}_1} = √{\frac{1 - \hat{ρ}_1^2}{n}} \;\;\;\;\;\; (17)

When test="AR1.mle", the MLE of the lag-1 autocorrelation is used, and its
standard deviation is estimated with the square root of the estimated variance
returned by arima.

Value

A list of class "htest" containing the results of the hypothesis test.
See the help file for htest.object for details.

Note

Data collected over time on the same phenomenon are called a time series.
A time series is usually modeled as a single realization of a stochastic process;
that is, if we could go back in time and repeat the experiment, we would get
different results that would vary according to some probabilistic law.
The simplest kind of time series is a stationary time series, in which the mean
value is constant over time, the variability of the observations is constant over
time, etc. That is, the probability distribution associated with each future
observation is the same.

A common concern in applying standard statistical tests to time series data is
the assumption of independence. Most conventional statistical hypothesis tests
assume the observations are independent, but data collected sequentially in time
may not satisfy this assumption. For example, high observations may tend to
follow high observations (positive serial correlation), or low observations may
tend to follow high observations (negative serial correlation). One way to
investigate the assumption of independence is to estimate the lag-one serial
correlation and test whether it is significantly different from 0.

The null distribution of the serial correlation coefficient may be badly affected
by departures from normality in the underlying process (Cox, 1966; Bartels, 1977).
It is therefore a good idea to consider using a nonparametric test for randomness
if the normality of the underlying process is in doubt (Bartels, 1982).
Knoke (1977) showed that under normality, the test based on the rank serial
correlation coefficient (and hence the test based on the rank von Neumann ratio
statistic) has asymptotic relative efficiency of 0.91 with respect to using the
test based on the ordinary serial correlation coefficient against the alternative
of first-order autocorrelation.

Bartels (1982) performed an extensive simulation study of the power of the
rank von Neumann ratio test relative to the standard von Neumann ratio test
(based on the statistic in Equation (9) above) and the runs test
(Lehmann, 1975, 313-315). He generated a first-order autoregressive process for
sample sizes of 10, 25, and 50, using 6 different parent distributions: normal,
Cauchy, contaminated normal, Johnson, Stable, and exponential. Values of
lag-1 autocorrelation ranged from -0.8 to 0.8. Bartels (1982) found three
important results:

The rank von Neumann ratio test is far more powerful than the runs test.

For the normal process, the power of the rank von Neumann ratio test was
never less than 89% of the power of the standard von Neumann ratio test.

For non-normal processes, the rank von Neumann ratio test was often much
more powerful than of the standard von Neumann ratio test.