Saturday, July 25, 2015

Explainer: Correlation, causation, coincidence and more

Over
a 10-year period, Americans’ fondness for margarine correlated strongly
with the divorce rate in Maine. Yet there’s no reason to think one
caused the other. It’s an instance of two unrelated data sets showing a
coincidental pattern. Photo: Tyler Vigen/“Spurious Connections”/

Eating more mozzarella cheese shouldn’t make engineering schools hand
out more diplomas. Yet between 2000 and 2009, the more mozzarella that
Americans downed, the more doctorates in civil engineering that U.S.
universities awarded. Over a 10-year period, as levels of one went up,
so did the other. The two showed a strong positive correlation. Yet
almost certainly this happened by coincidence. One did not cause the other.

This is a cheesy example. Still, it shows an important point about statistics: Correlation is not the same thing as causation — showing that one thing caused the other.

Another complication: Many events or trends can have multiple causes. And sometimes two variables might both be due to a third factor. All of this can sometimes confuse, or confound,
a statistical study. (Statistics involves collecting and analyzing
numerical data in large quantities and interpreting their meaning.)

Experiments can rule out such other — or confounding — causes by having a test group and a control group. But that’s not always possible or ethical. For example, researchers would not want to expose children to toxic chemicals just to see what bad effects might follow.

Fortunately,
statistics offers mathematical tools that can account for possible
confounders. That allows scientists to see how much a change in one
variable might be linked to differences in something else.

Researchers
built such a tool into their computer model for a recent study about
lead. The model had data about lead in children’s blood and scores on a
third-grade test. The researchers wanted to look for any link between
those two variables. In addition, the model had data on family income,
ethnicity and other things.

The statistical tool used math to rule
out possible effects from those other factors. That let the model
measure just the relationship between lead and test scores. Compared to
children with no lead poisoning, children with even low levels of lead
in their blood were more likely to fail the reading and math portions of
the test. Environmental Health published the research on April 7, 2015.Read more... Source: Science News for Students

Contact me

About Me

Hello, my name is Helge Scherlund and I am the Education Editor and Online Educator of this personal weblog and the founder of eLearning • Computer-Mediated Communication Center.
I have an education in the teaching adults and adult learning from Roskilde University, with Computer-Mediated Communication (CMC) and Human Resource Development (HRD) as specially studied subjects. I am the author of several articles and publications about the use of decision support tools, e-learning and computer-mediated communication. I am a member of The Danish Mathematical Society (DMF), The Danish Society for Theoretical Statistics (DSTS) and an individual member of the European Mathematical Society (EMS). Note: Comments published here are purely my own and do not reflect those of my current or future employers or other organizations.