A new website reveals relationships between seemingly unrelated variables and reminds us that correlation does not imply causation

ITworld|May 20, 2014

Image credit: REUTERS/Laszlo Balogh

There’s been a lot of hand-wringing in recent years about the alarming rate at which bees have been disappearing. Despite the fact that Einstein probably never actually said that if the bees go man will follow soon afterwards, it’s alarming anyways, partly because the reason for their rate of decline is so mysterious. Well, good news, because we might finally have an answer to the bee mystery: it seems that the rate at which honey bees have been disappearing is negatively correlated with the number of U.S. doctorates being awarded in computer science.

What’s that you say? It seems improbable that the number of computer science PhDs could be affecting the rate of bee disappearance (or vice versa)? Well, you’re probably right, but there is actually such a correlation, which we know now thanks to Spurious Correlations, a new, very fun, website created by Tyler Vigen, a student at Harvard Law School.

Vigen wrote some code to detect correlations between pairs of seemingly unrelated variables, such as U.S. crude oil imports from Venezuela and U.S. per capita consumption of high fructose corn syrup (correlation: 0.884883). The whole point of the site, Vigen explains in this video, is to show that, while computers can be great for identifying correlations between variables, it still takes humans to ultimately determine whether there’s truly a relationship between two things and, most importantly, why such a relationship might exist.

Still, the site is fun to play with, and I decided to see if I could use it to identify any correlations that might be of interest to ITworld readers. Given that I wrote last week about how U.S. college students aren’t exactly tripping over themselves to choose computer science as major, I was interested to see that Vigen included the rate at which U.S. schools are awarding PhD degrees in computer science as a variable.

Turns out that, at least at the PhD level, the number of U.S. students studying computer science correlates positively with, among other things, the number of of accidental poisonings by nonopioid analgesics, antipyretics and antirheumatics (correlation: 0.896829). Who knew?

Vigen’s site also shows you which pairs of variables correlate negatively with other. For example, the number of CS PhDs is strongly negatively correlated with the per capita consumption of whole milk (correlation: -0.944048).

Based on all this, one could conclude that, in order to increase the number of bees, we need fewer computer science PhD degrees to be awarded, which could be achieved by increasing the per capita consumption of whole milk. Perhaps, then, if the government would subsidize a steep price cut in the cost of whole milk, our disappearing bee problem would be solved!

Ok, obviously, this is all tongue-in-cheek and people choosing to get computer science PhDs has nothing to do with the disappearing bees. The point is, Spurious Correlations is a very fun site to play with! Take a look if you haven’t already.