Significance tests for autocorrelation indices

Significance tests for autocorrelation indices

As noted in the preceding sections the various global and local spatial autocorrelation coefficients discussed can be tested for statistical significance under two, rather different, model assumptions. The first is the classical statistical assumption of Normality, whereby it is assumed that the observed value of the coefficient is the result of the set {zi} of values being independent and identically distributed drawings from a Normal distribution, implying that variances are constant across the region. The second model is one of randomization, whereby the observed pattern of the set {zi} of values is assumed to be just one realization from all possible random permutations of the observed values across all the zones.

Both models have important weaknesses, for example as a result of underlying population size variation and lack of homogeneity of probabilities, but are widely implemented in software packages to provide estimates of the significance of observed results. In the case of the randomization model many software packages generate a set of N random permutations of the input data, where N is user specified. For each simulation run index values are computed and the set of such values used to provide a pseudo-probability distribution for the given problem, against which the observed value can be compared. A z-transform of the coefficients under Normality or Randomization assumptions is distributed approximately N(0,1), hence this may be compared to percentage points of the Normal distribution to identify particularly high or low values.

The Rookcase add-in for Excel computes both values, for the global Moran Iand Geary C, and the summary results in Figure 5‑37 illustrate this for the revised test dataset shown in Figure 5‑32. Note that these results are calculated using overall weight adjustment rather than row-wise adjustment. The results suggest the observed index value of 0.26 could reasonably be expected to have occurred by chance. With the local variant of Moran’sIthe same procedures as for the Global statistic can be adopted.

Crimestat provides the option to compute z-scores for this statistic but notes that it may be very slow to calculate, depending on the number of zones. GeoDa uses random permutations to obtain an estimated mean and standard deviation, which may then be used to create an estimated z-score.

The local variants of these statistics utilize values in individual cells or zones over and over again. This process means that the values obtained for adjacent neighbors will not be independent, which breaches the requirements for significance testing, a problem exacerbated by the fact that such cells are also likely to exhibit spatial autocorrelation. There are no generally-agreed solutions to these problems, but one approach is to regard each test as one of a multiple set. In classical statistics multiple tests of independent sets (the members of which are not correlated) are made using a reduced or corrected significance level, α. The most basic of these adjustments (known as a Bonferroni correction, after its original protagonist) is to use a significance level defined by α/n, where n is the number of tests carried out, or in the present case, the number of zones or features being tested. If n is very large the use of such corrections results in extremely conservative confidence levels that may be inappropriate. Monte Carlo randomization techniques that seek to estimate the sampling distribution may be more appropriate in such cases.