Meta

Hypothesis Shopping Part III – when 95% confidence becomes 60%

Following up on yesterdays post I want to briefly point out how to adjust confidence levels to account for more than one hypothesis tested.

To recap, when doing standard confidence level analysis the question asked is the following

If the null-hypothesis was true, what would be the chances of obtaining this particular result?

The null-hypothesis is the opposite of the hypothesis, in the sense that when hypothesis claims ‘relationship X is there‘ and the null-hypothesis says ‘nope, no special relationship there‘ so it typically it is something like ‘there is no correlation‘, ‘all outcomes are equally likely‘, etc.

Asserting something at a say 95% level of confidence is akin to stating that – under the null-hypothesis – the chance of the observed outcome happening is 5% (or smaller; but this is then a higher confidence).

So what happens if we throw multiple hypothesis at the same problem? Assuming we prove something at the 95% level this means that every there is a 5% chance of a false hypothesis being accepted. If we throw say 10 hypothesis at this problem, the chances that at least one of them sticks is
\[
p_{eff} = 100% – (100%-5%)^{10} = 59.9%
\]
so by using ten hypothesis instead of one we decreased the effective confidence to about 60%.

We can of course do this calculation see other way round: given that we want to achieve a certain effective level of confidence (say, 95%) what is the confidence \(p_0\)to which we have to hold every single hypothesis if we throw multiple (n) hypothesis at the same problem? To do this we have to solve \(p_0^n = p_{eff}\) which is of course easy enough
\[
p_{eff} = sqrt[n]{p_0}
\]
to stick some numbers to it:
1 2 3 4 5 6 7 8 9 10
95.0% 97.5% 98.3% 98.7% 99.0% 99.1% 99.3% 99.4% 99.4% 99.5%
90.0% 94.9% 96.5% 97.4% 97.9% 98.3% 98.5% 98.7% 98.8% 99.0%
80.0% 89.4% 92.8% 94.6% 95.6% 96.3% 96.9% 97.2% 97.6% 97.8%

So to achieve 90% confidence under 10 hypothesis each of them must be tested against a 99% level, and even to achieve 80% confidence (which is not much) and 97.8% confidence is needed on the single hypothesis

Note: a very interesting related article on why most published research might be false is here