'The first crack in the wall of significance testing'

If you have ever read a piece of psychology research cover to cover, you will almost certainly have witnessed the p-value, the controversial statistical measure that we have discussed at length on this blog previously. Last month, a scientific journal made perhaps the boldest move since journals began opening their doors to open access. The journal, Basic and Applied Social Psychology (BASP), has banned the use of null hypothesis significance testing, a technique used almost universally in psychology research and much scientific research across the board. Some, however, might call this the nuclear option, as when used properly, the p-value can be a useful indicator. Instead of significance testing, the journal will rely on arguably more reliable measures that are often left out of modern psychology research:

"BASP will require strong descriptive statistics, including effect sizes. We also encourage the presentation of frequency or distributional data when this is feasible. Finally, we encourage the use of larger sample sizes than is typical in much psychology research, because as the sample size increases, descriptive statistics become increasingly stable and sampling error is less of a problem"

Significance testing is one of the most important, yet most widely misunderstood definitions in science. Over at the excellent Science-Based Medicine blog, Yale clinical neurologist Steven Novella sums up the problem well:

"The p-value was never meant to be the sole measure of whether or not a particular hypothesis is true. Rather it was meant only as a measure of whether or not the data should be taken seriously."

Novella's account refers to an absolutely beautiful Guitar-Hero-meets-Space-Invaders-meets-Tetris-meets-roulette statistical demonstration of the problem by Geoff Cumming, dubbed "The dance of the p-values." If it is not the most inspired stats lesson you've ever had, then I'll eat my hat: