Monthly Archives: July 2015

“It is becoming apparent that you do not know how to use the results from either system. The progress of science would be safer if you get some advice from a person that knows what they are doing.”

— David Winsemius (in response to a user that obtained different linear regression results in R and SPSS and wanted to know which one to use) R-help (July 2011)

I can always count on my fortunes R-package for a good laugh (especially at the expense of SPSS users), however, this post raises an interesting point about the misuse of statistics.

First, let me digress. Before undergraduate level coursework in psychology, I didn’t know much about the way people acted. After some undergraduate level classes, I knew everything about the inner workings of the mind. I knew that priming people with stereotypically older words reduced their walking speed (Bargh, Chen, & Burrows, 1996), that the Implicit Association Test (IAT; Greenwald et al., 2002) measured meaningful unconscious attitudes, that narcissism was associated with using more first person pronouns (Raskin & Shaw, 1988), etc. It wasn’t until several years in graduate school, advanced statistical training, reading some meta-research, and a visit from the replication police that I realized a) that the findings are never as clear cut as they seem and b) all of these findings have been called into question (Priming; Doyen, Klein, Pichon, Cleeremans, 2012; Pronouns: Carey et al., 2015; IAT; Blanton et al., 2009). Further reading reveals p-hacking (Simonsohn Nelson, & Simmons, 2014), incredibility indices (Schimmack, 2012), and that half of all published findings may be false (Ioannidis, 2005).

I hope this digression illustrates the point that a little knowledge and a false sense of understanding can be dangerous. A novice statistician who runs participants until his or her hypotheses are statistically significant might not realize he/she just increased type one error rate to 20% despite a p < .05 statistical test (Sherman, 2014), but those findings get published.

This brings me back to the original (humorous) quote from my R-fortunes package. Misuse and misunderstanding of analyses are some of the reasons that so few findings across many scientific disciplines do not replicate (Freedman, Cockburn, & Simcoe, 2015). I think the ‘take away’ from this ‘fortune’ (and blog post) is that statistics are often misused and abused, sometimes knowingly and other time unwittingly. The scientific process is slow and self-correcting, but not perfect. Published papers are not necessarily error free. Interpret analyses cautiously. Interpret the research of others cautiously. Most importantly, use R, not SPSS.

I work as a Data Scientist for a database marketing company, and I spend a great deal of time predicting responders to credit based marketing offers, defaults on loans, and analyses of that nature. However, my graduate training (and expected PhD) is in Experimental Psychology.[i] When people find this out, I often get a confused look and the question: How did you get into this business?

Whenever I find myself having this same conversation, I am reminded of a scene from the movie Margin Call that portrays an insider view on the financial crisis. Here is an exchange between one of the “Quants” and a member of senior management.

What’s a specialty in propulsion, exactly?

My thesis was a study in the way that friction ratios affect steering outcomes in aeronautical use under reduced gravity loads.

So, you are a rocket scientist?

I was.

How did you end up here?

Well, it’s all just numbers, really. You’re just changing what you’re adding up . . .

While I am not exactly a rocket scientist, I share his sentiment. A Support Vector Machine Regression model does not care if I am trying to predict a personality trait or the likelihood of defaulting on a loan. The numbers are the same. That is the beauty of math. It is universal. Techniques that I apply to large-scale analyses on social media can be adapted to study nearly anything else that I find interesting. So the next time someone asks me how I got into the database marketing business, I will tell them, “Well, it’s all just numbers, really. You’re just changing what you’re adding up…” Or, I might just point them toward this blog post.

[i] This confusion is no doubt compounded by the confusion between Psychologists, Counselors, Psychiatrists, etc. and research psychology, but that is a conversation for another blog.