Understanding Statistical Significance

In the
main article
, I said that "one out of every twenty significant results might be random" if you rely solely on statistical analysis. This is a bit of an oversimplification. Here's the detailed story.

"Statistical significance" refers to the probability that the observed result could have occurred randomly if it has no true underlying effect. This probability is usually referred to as "
p
" and by convention,
p
should be smaller than 5% to consider a finding significant. Sometimes researchers insist on stronger significance and want
p
to be smaller than 1%, or even 0.1%, before they’ll accept a finding with wide-reaching consequences.

Good researchers start by building their hypotheses on qualitative insights. For example, after having observed how people read online, a researcher might suspect that scannable layouts would make website content easier to read and understand. If you run statistical tests on questions that are likely to be true, your findings are less likely to be false.

As a thought experiment, let's assume that a researcher, Dr. Bob, has established 100 hypotheses, of which 80% are true. Given the statistical analysis, Bob will erroneously accept one of the twenty false hypotheses. Assuming Bob is running a study with good statistical power, he'll accept most of the eighty true hypotheses, rejecting maybe ten as insignificant. Bob will then publish seventy-one of his conclusions, of which seventy are true and one is false. In other words, only 1.4% of Bob's papers will be bogus.

Unfortunately, not all real-world researchers are good enough that 80% of their hypotheses will be correct. And not all studies have sufficient statistical power to accept seventy out of every eighty correct hypotheses. Thus, the percentage of bogus results in most published quantitative research is higher, but we can't determine that percentage exactly because it depends both on researchers' competence and their pre-study insights.