About research in the West African Sahel and spatial data handling

Main menu

Post navigation

Spurious Correlations

Correlations are a very famous and popular way to express relationships (and their strength) between two variables. Applications in environmental sciences span from relations of satellite based parameters with ground observations, to relationships between parameters like vegetation and precipitation. Furthermore, scientists use correlations to find linkages between totally different datasets of different scientific disciplines and spatial scales, e.g. migration and environment. However, many scientists blindly trust these statistical analyses and even low correlations are often interpreted in an awkward and very speculative way without questioning the results.

Too much reliance on statistical parameters can be dangerous, as you can have a strong correlation between two variables that are not related. This is shown in this website where, for example, the Per capita consumption of margarine (US) is correlated with Divorce rate in Maine at a correlation coefficient of 0.992558. How would you interpret such a relationship? Does this prove that married people shouldn’t eat margarine? It’s a nonsense correlation, these two variables simply happened to occur during the same years (the correlation was based on time, not space). In this relationship, the scale problems are quite obvious. First of all, the variables do not have the same spatial extent, even though they overlap in Maine. Also the temporal detail can be questioned. Many things happen during a year, so how would this correlation look at a finer temporal detail, for example monthly? We’re sure it would not be as strong.

Strong correlations can often be found between variables that are not directly linked, especially when the spatial and temporal details are coarse (e.g. nationwide, yearly).

The interpretation of statistical analysis outputs can be a challenge and therefore it is important to make sure that you know what you’re doing. Furthermore, the output values should be interpreted usingcommon senseand an awareness of how scale issues might affect the results.