Significance testing is one of the more lively areas of statistics. In general,
the idea is not to make too many mistakes in our conclusions. If this
only applied to Type I errors, we could all relax, apply the most conservative
tests of significance possible and restrict ourselves to the study of the
glaringly obvious. R attends to the problem
of significance testing in some ways, but sensibly avoids prescribing methods
which may not be appropriate for particular analyses.

Adjustments for multiple comparisons

The basic method of adjusting for multiple comparisons is to define the group of
comparisons that are to be tested and select an appropriate method of adjustment
and the overall probability of a Type I error (perhaps considering the
implications of Type II errors). Then, either define a critical probability
which any test in that group must exceed or adjust the probability of each test
individually and compare that to the selected overall probability of Type I
errors. The latter method has been established in
R in the function p.adjust(),
but it's a bit awkward to integrate with functions like anova()
that may produce a table with a number of probabilities.

Using the infert data set, we'll apply the Bonferroni correction to
multiple tests of the prevalence of induced labor within groups defined by
educational attainment in the infert data set. First let's go
through the function
group.prop.test()
that I found useful for repetitive testing of groups of Bernoulli trial
(success/failure) data where the outcome of interest was which groups differed
from the overall proportion, that is, which groups were better or worse than the
average level of success by a fairly conservative test.

The usual checks of the input data are performed, then the overall proportion is
calculated and the result list is set up, filled with blanks and zeros.
For each group defined by the grouping vector by, a test of
proportions is conducted, and the adjusted probability stored in the appropriate
element of gptest. Notice that the formatting of the group names
was performed after the calculation. Otherwise the comparison performed by
subset would have failed. After the calculation, the results are
printed out and the list of results is returned invisibly. By playing around
with this data, you may discover that a simple test of the contingency table
indicates that the groups do not come from the same population, but in fact
none differ from the average prevalence of induced labor, at least by this test.

When I originally wrote the function, it simply printed out the critical
(corrected) p-value at the top of the table, and all of the observed values
were compared with that.