Hey, I Just did a Significance Test!

I’ve seen it happens quite often. The sig test. Somebody simply needs to know the p-value and that one number will provide all of the information about the study that they need to know. The dataset is presented and the client/boss/colleague/etc invariably asks the question “is it significant?” and “what’s the correlation?”. To quote R.A. Fisher “To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.”

So obviously in the previous example there are probably lots of problems ranging from the person making the request doesn’t know what they want to a bigger problem of the study itself may have core structural problems that undermines the overall integrity of any results. I want to focus on the former and share some thoughts on how to do a couple of hypothesis tests and selecting the appropriate test. This is mainly a topic of parametric vs. non-parametric tests. I’ll show the one most people are probably most familiar (parametric tests) and then show an alternative, more appropriate hypothesis test. I won’t get into the philosophical topics of hypothesis testing or whether .04999 is significant and .05001 is not significant. This simply provides some hypothesis testing options that too often get overlooked in market and/or social research. Yes, parametric tests provide more power but not everything in this world fits nicely into one package with a bow on top.

What’s the Correlation

I’ve had to happen more often that I would care to count. I’m given a dataset and then asked what’s the “correlation”? After some back and forth I find that they want the “correlation” on all permutations (with p-value) of the questions on the questionnaire. So here goes:

Pearson’s Correlation or, more formally, Pearson product-moment correlation coefficient. This is the correlation with which most people are familiar. Be sure to check the assumption (i.e. homoscedasticity, linearity, normality). However, Spearman’s Rho or Kendall’s Tau (depending on how you want to interpret the results) may in fact be the better options.
[sourceode language="css"]

This is a popular hypothesis test because people want to know if something is better (or worse) than something else. So the initial choice is a t-test. For example did Group 1 make more money than Group 2 or did more people remember seeing an ad on theater screen 1 versus theater screen 2. So rather than the parametric t-test we can use the Mann-Whitney U Test on our data that doesn’t meet the t-test assumptions.

Sure you can just run you chi-square test and be done with it. But there is one small problem. The assumptions for a chi-square test are in not met. Namely, the cell sizes are way too small. So what’s a researcher to do? This is where Fisher’s Exact Test works well. If we can assume that the marginal totals are given then we can solve the problem this way:

fisher.test(raw.chisq$X1, raw.chisq$X2)

Is There a Difference in My Three Groups

Yes, it’s true, you can run three t-tests on you groups (1 vs 2, 1 vs 3, 2 vs 3). But that causes not only extra work but problems with your hypothesis test itself. Plus why do multiple tests when you can be more efficient in you testing and do just one ANOVA. Here you can do a non-parametric Kruskal-Wallis Rank Sum Test when you can’t make the assumptions for the parametric analysis of variance.

Ultimately, it is important to understand what you’re doing and why. Just because R, SPSS, SAS, etc. gives you a number it doesn’t mean it’s correct. If you find that something is significant make sure you understand what is it saying and what your test. You don’t want to run an independent 2 sample test only to find out that it should have been a matched pairs test.