How Spatial Autocorrelation (Global Moran's I) works

In this topic

The Spatial Autocorrelation (Global Moran's I) tool measures spatial autocorrelation based on both feature locations and feature values simultaneously. Given a set of features and an associated attribute, it evaluates whether the pattern expressed is clustered, dispersed, or random. The tool calculates the Moran's I Index value and both a a z-score and p-value to evaluate the significance of that Index. P-values are numerical approximations of the area under the curve for a known distribution, limited by the test statistic.

Calculations

The math behind the Global Moran's I statistic is shown above. The tool computes the mean and variance for the attribute being evaluated. Then, for each feature value, it subtracts the mean, creating a deviation from the mean. Deviation values for all neighboring features (features within the specified distance band, for example) are multiplied together to create a cross-product. Notice that the numerator for the Global Moran's I statistic includes these summed cross-products. Suppose features A and B are neighbors, and the mean for all feature values is 10. Notice the range of possible cross-product results:

Feature values

Deviations

Cross-products

A=50

B=40

40

30

1200

A= 8

B=6

-2

-4

8

A=20

B=2

10

-8

-80

When values for neighboring features are either both larger than the mean or both smaller than the mean, the cross-product will be positive. When one value is smaller than the mean and the other is larger than the mean, the cross-product will be negative. In all cases, the larger the deviation from the mean, the larger the cross-product result. If the values in the dataset tend to cluster spatially (high values cluster near other high values; low values cluster near other low values), the Moran's Index will be positive. When high values repel other high values, and tend to be near low values, the Index will be negative. If positive cross-product values balance negative cross-product values, the Index will be near zero. The numerator is normalized by the variance so that Index values fall between -1.0 and +1.0 (see the FAQ section below for exceptions).

After the Spatial Autocorrelation (Global Moran's I) tool computes the Index value, it computes the Expected Index value. The Expected and Observed Index values are then compared. Given the number of features in the dataset and the variance for the data values overall, the tool computes a z-score and p-value indicating whether this difference is statistically significant or not. Index values cannot be interpreted directly; they can only be interpreted within the context of the null hypothesis.

Interpretation

The Spatial Autocorrelation (Global Moran's I) tool is an inferential statistic, which means that the results of the analysis are always interpreted within the context of its null hypothesis. For the Global Moran's I statistic, the null hypothesis states that the attribute being analyzed is randomly distributed among the features in your study area; said another way, the spatial processes promoting the observed pattern of values is random chance. Imagine that you could pick up the values for the attribute you are analyzing and throw them down onto your features, letting each value fall where it may. This process (picking up and throwing down the values) is an example of a random chance spatial process.

When the p-value returned by this tool is statistically significant, you can reject the null hypothesis. The table below summarizes interpretation of results:

The p-value is not statistically significant.

You cannot reject the null hypothesis. It is quite possible that the spatial distribution of feature values is the result of random spatial processes. The observed spatial pattern of feature values could very well be one of many, many possible versions of complete spatial randomness (CSR).

The p-value is statistically significant, and the z-score is positive.

You may reject the null hypothesis. The spatial distribution of high values and/or low values in the dataset is more spatially clustered than would be expected if underlying spatial processes were random.

The p-value is statistically significant, and the z-score is negative.

You may reject the null hypothesis. The spatial distribution of high values and low values in the dataset is more spatially dispersed than would be expected if underlying spatial processes were random. A dispersed spatial pattern often reflects some type of competitive process—a feature with a high value repels other features with high values; similarly, a feature with a low value repels other features with low values.

Output

The Spatial Autocorrelation tool returns five values: the Moran's I Index, Expected Index, Variance, z-score, and p-value. These values are written as messages at the bottom of the Geoprocessing pane during tool execution and passed as derived output values for potential use in models or scripts. You may access the messages by hovering over the progress bar, clicking on the pop-out button, or expanding the messages section in the Geoprocessing pane. You may also access the messages for a previously run tool via the Geoprocessing History. Optionally, this tool will create an HTML report file with a graphical summary of results. The path to the report will be included with the messages summarizing the tool execution parameters. Clicking on that path will pop open the report file.

Best practice guidelines

Does the Input Feature Class contain at least 30 features? Results aren't reliable with less than 30 features.

A: Global statistics like the Spatial Autocorrelation (Global Moran's I) tool assess the overall pattern and trend of your data. They are most effective when the spatial pattern is consistent across the study area. Local statistics (like the Hot Spot Analysis (Getis-Ord Gi*) tool) assess each feature within the context of neighboring features and compare the local situation to the global situation. Consider an example. When you compute a mean or average for a set of values, you are also computing a global statistic. If all the values are near 20, the mean will also be near 20, and that result will be a very good representation/summary of the dataset as a whole. But if half of the values are near 1 and the other half of the values are near 100, the mean will be near 50. There might not be any data values anywhere near 50, so the mean value is not a good representation/summary of the dataset as a whole. If you create a histogram of the data values, you will see the bimodal distribution. Similarly, global spatial statistics, including the Spatial Autocorrelation (Global Moran's I) tool, are most effective when the spatial processes being measured are consistent across the study area. Results will then be a good representation/summary of the overall spatial pattern. For more information, see Getis and Ord (1992) cited below, and the analysis of SIDS they present.

Q: Why are the results from High Low Clustering (Getis-Ord General G) different than the results from Spatial Autocorrelation (Global Moran's I)?

Q: Can you compare the z-scores or p-values from this tool to results from analyses for different study areas?

A: Results are not comparable across different study areas. When the study area is fixed, however (for example, all analyses are for Counties in California), the Input Field is comparable (for example, all analyses involve some type of population count), and the tool parameters are the same (Fixed Distance with a Distance Band or Threshold Distance of 5,000 meters and Row Standardization, for example), you may compare statistically significant z-scores to get a sense of the intensity of spatial clustering or spatial dispersion or to better understand trends over time. You can also run the analysis for a series of increasing Distance Band or Threshold Distance values to see the distance/scale where the processes promoting spatial clustering are most pronounced.

Q: Why am I getting a Moran's Index greater than 1.0 or less than -1.0?

A: In general, the Global Moran's Index is bounded by -1.0 and 1.0. This is always the case when your weights are row standardized. When you don't row standardize the weights, there may be instances where the Index value falls outside the -1.0 to 1.0 range, and this indicates a problem with your parameter settings. The most common problems are the following:

The Input Field is strongly skewed (create a histogram of the data values to see this), and the Conceptualization of Spatial Relationships or Distance Band is such that some features have very few neighbors. The Global Moran's I statistic is asymptotically normal, which means for skewed data, you will want each feature to have at least eight neighbors. The default value computed for the Distance Band or Threshold Distance parameter ensures that every feature has at least one neighbor, but this may not be sufficient, especially when values in the Input Field are strongly skewed.

An Inverse Distance Conceptualization of Spatial Relationships is used, and the inverted distances are very small.

Row standardization is not selected, but should be. Whenever your data has been aggregated, unless the aggregation scheme relates directly to the field you are analyzing, you should select row standardization.

Potential applications

Help identify an appropriate neighborhood distance for a variety of spatial analysis methods by finding the distance where spatial autocorrelation is strongest.