Multiple hypothesis testing

Fortran must die

I am reading one of those books that make my head hurt, because it raises many issues and disagrees with what to me is established views on a topic (that is things I think are settled).

A concern is sometimes expressed that if you test a large number of hypotheses, then your bound to reject some [even if they are right].....From out data analysis perspective, however, we are not concerned about multiple comparisons [and thus about corrections like Bonferonni]. For one thing, we almost never expect any of our 'point null hypotheses' (that is hypotheses that a parameter equals zero, or that two parameters are equal) to be true, and so we are not particularly worried about the possibility of rejecting them too often.....There is no need to correct for the multiplicity of tests if we accept that they will be mistaken on occasion."

"The second problem [with statistical significance] is that changes in statistical significance are not themselves significant. By this, we are not merely making the commonplace observation any particular threshold is arbitrary[so a 5 percent significance level is really not that different than a 4.9 percent level].....Rather we are pointing out that even large changes in significance levels can correspond to small, nonsignficant changes in the underlying variable."

TS Contributor

"The second problem [with statistical significance] is that changes in statistical significance are not themselves significant. By this, we are not merely making the commonplace observation any particular threshold is arbitrary[so a 5 percent significance level is really not that different than a 4.9 percent level].....Rather we are pointing out that even large changes in significance levels can correspond to small, nonsignficant changes in the underlying variable."

I would phrase this differently. Statistical significance does not link directly to the effect size. Therefore, you could have a p-value of .049 with a very large effect (due to small sample size or large variation within groups), and another p-value of 0.001 with a very small effect (due to large sample sizes and small variation within groups).

Fortran must die

That makes sense miner although I am not sure the authors agree or are making that point. The sense I get is they see significance levels generally as unimportant. Its the same issue of, assuming statistical power was not an issue, if a lower p value means something is more certain. So if you get a p value of .02 you are much certain the null should be rejected than if it was .04. Or worse that this says anything about relative impact or importance of effect size.

Having said that I, following a suggestion I found in the literature, use the size of the wald statistic in logistic regression to rank the relative impact of predictors so I violate my own rule (the wald statistic used to do the statistical test of predictors that is).

Super Moderator

There's lots of debate about whether corrections for familywise Type 1 error rate such as the Bonferroni are necessary (e.g., Google Scholar search). I don't think it's a particularly resolvable debate - or one where you can say that one position is "wrong". I say that because ultimately deciding whether a correction is necessary depends on what risks you are willing to take when publishing findings (or more specifically, the relative costs of Type 1 and Type 2 errors, and by extension what risk of committing each you're willing to live with). It's kinda like asking if someone is "wrong" to try sky-diving - it depends entirely on how much you value the fun of jumping out of a plane vs. the risk of ending up squashed on the ground.

Fortran must die

There's lots of debate about whether corrections for familywise Type 1 error rate such as the Bonferroni are necessary (e.g., Google Scholar search). I don't think it's a particularly resolvable debate - or one where you can say that one position is "wrong". I say that because ultimately deciding whether a correction is necessary depends on what risks you are willing to take when publishing findings (or more specifically, the relative costs of Type 1 and Type 2 errors, and by extension what risk of committing each you're willing to live with). It's kinda like asking if someone is "wrong" to try sky-diving - it depends entirely on how much you value the fun of jumping out of a plane vs. the risk of ending up squashed on the ground.

Its not from an on line source. It is from the book "Data Analysis Using Regression and Multilevel/Hierarchical Models" by Andrew Gelman and Jennifer Hill. I usually don't cite sources that are not online because I don't think any are going to get a book to look at the comments.

It is one of the early chapters although I did not write the page number down so I am not sure where specifically it is sited. BTW although I have just starting reading this I think its excellent although pieces go way over my head (probably won't be an issue for more sophisticated readers here).

I do accept the logic that corrections are necessary - which seems to me to be the position theoretically of most authors (certainly I had not run into a previous writer who suggested it was not desirable). Thus my comment that not doing it is wrong.

Omega Contributor

I didn't want to do this, because the last thing you need is more information to reference on how you think all stats is lumpy and not standardized or unified in ideology, but at the below link (posted yesterday) is Gelman talking about pvalues for over an hour. Knock yourself out:

Super Moderator

I do accept the logic that corrections are necessary - which seems to me to be the position theoretically of most authors (certainly I had not run into a previous writer who suggested it was not desirable).

TS Contributor

Another thing to consider is the author. Andrew Gelman is a Bayesian, so he doesn't buy into the concept of p-values in the first place. Second, he focuses on what he terms Type M (errors in magnitude) and Type S (errors in sign or direction) errors. Both of these will influence his views on frequentist statistics. This isn't to say he's wrong. I've found him to be very insightful on many topics, but he is definitely approaching it from a non-frequentist perspective.