Oct 15, 2018

Most of the material in this lecture was taken from here and this wonderful dsbook from Rafael Irizarry.

Motivation

In this lecture, we will discuss one of the most important aspects of analyzing data: being skeptical of results. We discuss some reasons why and give you examples.

“Correlation is not causation” is perhaps the most important lesson one learns in a statistics class. In this lecture, we have described tools useful for quantifying associations between variables. However, we must be careful not to overinterpret these associations.

There are many reasons that a variable \(X\) can be correlated with a variable \(Y\) without either being a cause for the other. Here we examine three common ways that can lead to misinterpreting data.

Spurious correlation

Outliers

Reversing cause and effect

Confounders

Next, we will discuss in detail what each of these are and give an example.

Spurious correlation

The following comical example underscores that correlation is not causation. It shows a very strong correlation between divorce rates and margarine consumption.