Monday, February 23, 2015

Journal bans null hypothesis significance tests

In a recent editorial [here], the journal Basic and Applied Social Psychology has banned the null hypothesis significance testing procedure (NHSTP). "We hope and anticipate that banning the NHSTP will have the effect of increasing the quality of submitted manuscripts by liberating authors from the stultified structure of NHSTP thinking thereby eliminating an important obstacle to creative thinking. The NHSTP has dominated psychology for decades; we hope that by instituting the first NHSTP ban, we demonstrate that psychology does not need the crutch of the NHSTP, and that other journals follow suit."

In a short bit about Bayesian analysis, the editorial says, "The usual problem with Bayesian procedures is that they depend on some sort of Laplacian assumption to generate numbers where none exist." I think here the editors are too focused on Bayesian hypothesis tests instead of on the much broader application of Bayesian methods to parameter estimation. For example, in the 750 pages of DBDA2E, I never mention the Laplacian assumption because the procedures do not depend on it. Despite their narrow view of Bayesian methods, I am encouraged by the bold move that might help dislodge NHST.

Yes and sort of fun! Since they ban NHST and doesn't really embrace Bayes it is as if they sort of discourage the use of statistical models altogether. I would say this is an exiting experiment! I guess one will have to resort to Tukey style exploratory data analysis, or similar...

I would ban all of hypothesis testing and model comparison not just NHST, so I think this is great news.

Since the editors do not recommend any alternative method, it will be interesting how will the authors and reviewers handle this inferential limbo. The problem of social psychology is that researchers use unvalidated measures and unvalidated, unstandardized manipulations which relation to research question is only hazy and unclear. Of course, when you don't know what you are measuring, then no amount of statistical sophistry will give you interpretable results. So I don't think this will remove junk research. But it may motivate some good research.

I'm as fanatical a pure Bayesian as it gets and I think this is a really really bad idea. People either need to be persuaded to avoid bad statistical methods or they're not persuaded and they keep on doing crap research. Those are the only two alternatives.

Trying to win debates by outlawing one side has never worked in science's favor in the long run. Frequentists did this sort of thing to Bayesians decades ago. How did that work out for them and for science?

I think that this is a scary precedent. The nominal "ban on NHSTP" is really misleading, since the journal is also banning confidence intervals, and discouraging Bayesian intervals! Indeed, inferrential procedures are no longer required. I believe that the journal is making a grave mistake by implementing this policy, because it will be very difficult for their editorial staff and readership to get a sense of uncertainty in the statistics that are presented. The editorial seems to acknowledge this, but suggests that the remedy is larger sample sizes...

I wonder if the editorial board simply dislikes statistics, or fails to understand its fundamental purpose.

I don't think that a sense of uncertainty has to be lost because on inferential statistics are used (Even if I personally prefer the Bayesian approach). Uncertainty can still be conveyed by standard deviations and standard errors. Moreover, in the applied field effect sizes should often be more relevant than the result of hypothesis test. I can only hope that researchers will report those more than :-).Finally, the editors write that they will be stricter when it comes to the required sample sizes. I think this is actually the best news, because larger samples do more to protect from inaccurate or incorrect inferences than frequentist or Bayesian stats alone can do. So here is hope that they enforce the requirement of larger than nowadays typically seen sample sizes.

Standard errors for the effect size?!? Perhaps the editors would instead accept percentiles of the sampling distribution? Say the 2.5th and 97.5th percentiles. Whoops, no, that's a confidence interval.

But seriously, this is all rather silly. The whole purpose of reporting effect size estimates, with or without standard errors, is to make inferences about the true but unknown effect size! If standard errors are also reported then we have exactly the same information as a Wald confidence interval. Do the editors not expect their readers to use that information to make inferences about the effect size?