A) You want to test the theory that people who were born close to noon on July 7 are unusually tall. You choose randomly 100 Norwegian men over 25 years old and discover that the one person born closest to noon of 7/7 is the 15th tallest among them. Then you chose 100 Nigerian women and discover that the woman born closest to noon on July 7 is the 10th tallest. You figure out that without the putative effect being real (in other words, under the null hypothesis) the chance for such results occuring at random is 1/10 times 3/20 which is 1.5%, and conclude that this lends significant support to your theory. Are you correct?

B) In a certain scientific area, the level of significance required for a statistical test is 5%. Would it serve the quality of scientific papers in this area to reduce the required significance level to, say, 0.5%, in order to exclude publishing papers which report experiments that were successful by sheer chance?

5 Responses to Test Your Intuition (9)

A) Well, there are roughly 365 different dates (or 200 hundred in this case [ignoring the birthday effect]) you could try. The probability that none of these dates will show this phenomena is roughly at most (1-1.5/100)^200 ~ 1/e^3. These events are not quite independent, but this estimate looks roughly valid.

B) I was under the impression that in such fields only if a large number of experiments “proves” such a correlation it is considered to be correct…

Probably there are various things one can say about questions A and B. Let me try to make question A more focused. We suppose that the researcher decided (in advance) to check the putative effect by looking at a random set of people and seeing how the person whose birth time is closest to July 7 noon ranked in terms of height.

Suppose he checked 200 people and the person was 3th tallest. This represents an event whose probability (under the the assumption that the effect does not exist) can be estimated as 3/200. If the required significance level is 5% or even 2% the researcher can conclude that he has a significant statistical evidence supporting the theory.

The question is if having two indepndent experiments with outcomes 15/100 and 10/100 can be regarded as equivalent as one experiment with outcome 3/200?

Sariel, multiple testing isn’t the problem here since as Gil has laid things out, we have the date specified beforehand.

That being said, yikes, I’m pretty horrified at how badly I failed at this, considering how close this stuff lies to my interests. At first, I thought, “Sure, they’re independent, just multiply the probabilities!”. After staring at the numbers for a couple seconds though, I became a bit skeptical and then the obvious occurred to me: p-values are always less than one so of course as you do more and more of these little experiments, the products of the p-values will (a.s.) go to 0 even under the null hypothesis.

So then I thought: okay, let’s take the product of the p-values. We get 0.015. Now the question we should ask ourselves is: what’s the probability of getting a product of p-values so small under the null hypothesis? Under that hypothesis, our two p-values are independent and uniformly distributed on [0,1] so we just do a little integral and arrive at 0.0779956. A bit better than the two p-values we started with, but not by a lot and not significant at any rate.

Then I realized: I’ve read about this before. Fisher had a way of doing this (surprise surprise) and a bit of googling reveals that it’s called Fisher’s method (surprise surprise). Throwing it into R produces a p-value of 0.077977. Interesting!