The data set has 63 records (N = 63) and a grand mean of 62.1 years (standard deviation = 8.1 years).

For these data to be analyzed in Epi Info, they must be structured with two variables -- one for the dependent (outcome) variable and
one for the independent (group) variable. Data for the illustrative data set are stored in AGEBYCEN.ZIP as the file AGEBYCEN.REC
with variables AGE (dependent variable) and CENTER (independent variable). The first three records and last record of this data set
are:

Thus, n1 = 22, n2 = 18, and n3 = 23. We also see that groups have similar means (means are 62.5, 63.3, and 60.8, respectively) and
standard deviations are not too dissimilar.

Side-by-side quartile plots can be drawn (by hand) with minimal effort by graphing each group median as a dot and whiskers from the
group's minimum to 25 percentile (bottom whisker) and 75 percentile to the maximum (top whisker). Click here for an example.

The objective of ANOVA is to determine whether one or more population means of the k groups differs. The null and alternative
hypotheses are:

H0: µ1 = µ2 = ... = µkH1: at least one population mean differs

where µi represents the population mean of group i {i: 1, 2, . . . k}.

Briefly, ANOVA partitions the variance in the data into the variance or mea square between (MSB) and the variance or mean square
(MSW). The ratio of these means squares is the F statistic:

Fstat = (MSB) / (MSW)

Under the null hypothesis, this test statistic has an F distribution with dfB = k-1 and dfW = N-k. The test is one-tailed focusing on the
upper extent of the FdfB, dfw distribution. For the illustrative data set, the ANOVA table and F statistic are:

Assumptions: The ANOVA tests has several hidden assumptions. Traditionally, we speak of the assumption of independence,
normality, and equal variance. In addition, statistical inferences assume validity of the data (i.e., freedom from selection bias and
information bias) and minimal confounding.

We frequently want to know how large a sample is needed when testing k
means. Although there is no simple answer to this question, a reasonable sample
size can be determined if certain assumptions are made. Let us concern ourselves with trying to establish a significant difference
among k means (via ANOVA) by asking how big a sample size is needed
to (a) detect a difference between two means of D,
(b) at a type I error
rate of a, (c) with probability (power) 1-b.
It is necessary to have a prior estimate of variability s of the
outcome variable, with such estimates coming from a pilot study, prior published
results, a preliminary
analysis, or intuition. Computational solutions are possible once these
underlying assumptions are made clear, with formulas are available in Sokal & Rohlf,
1996, pp. 263-264 (for instance). Calculations have been programmed into a the
Dept of OB/GYN at the University of Hong Kong website via the URL http://department.obg.cuhk.edu.hk/ResearchSupport/Sample_size_CompMean.asp. (If
this link does not take you directly to the sample size calculator, click Sample
Size > Comparing Means.)

Illustrative example. Suppose we test H0: µ1 = µ2
= µ3. Prior study suggests the measurement has s@ 8.
To find a mean difference of 5, the University of Hong Kong website derives the following results.

Type I error=0.05

Type I error=0.01

Type I error=0.001

Power=80%

41

60

87

Power=90%

54

76

107

Power=95%

67

91

125

Notice that the output provides samples sizes per group (ni)
at various power and alpha levels. For example, under the stated assumptions, we need n = 54 for 90% power at a = .05.

(A) Univariate description of ALCS. Before performing ANOVA, describe the distribution of alcohol scores for all groups combined
(MEANS ALCS). Show the distribution of this variable in the form of a histogram (HISTOGRAM ALCS). Are data skewed? What
percentage of people in the sample are non-drinkers? (B) ALCS by INC. Assess alcohol consumption by income.

(2) DEERMICE: Weight Gain in White-Footed Deer Mice (Hampton, 1994, p. 118, modified). Fifteen deermice are randomly
assigned to one of three groups. Group A receives a standard diet, Group B receives a diet of junk food, and Group C receives a diet of
health food. The research question is to determine whether WTGAIN differs by DIET. Weight gains (gms.) are as follows:

(4) MAT-ROLE.ZIP: Adaptation to Maternal Roles (Howell, 1995, pp. 302 - 304). In a study of the development of low-birthweight
(LBW) infants, three groups of newborns differed in terms of birthweight and whether their mothers had participated in a training
program about the special needs of low-birthweight infants. The mothers were then interviewed with the infants were 6 months old.
There were three groups in the experiment: an LBW Experimental group (Group 1), an LBW Control group (Group 2), and a
Full-Term Control group (Group 3). The two control groups received no special training, and so serve as a reference against which to
compare the performance of the experimental intervention. The LBW Experimental group was part of the intervention program, and
the researchers hoped to show that those mothers would adapt to their new role as well as the mothers of full-term infants. On the other
hand, they expected mothers of LBW infants who did not receive the intervention to have some difficulty adapting. The outcome
measure is an adaptation scale, whereby high values indicate some trouble adapting. (Being a parent of a low-birthweight baby is not
an easy task, especially for the first few months, see Achenbach et al. 1993). Data are contained in MAT-ROLE, which can be
downloaded form the server by clicking on the highlight filename, above. Download the data set and analyze these data.