Contingency Tables

Do different treatments cause different effects?

Fruit trees are subject to a bacteria-caused disease commonly called
fire blight (because the resulting dead branches look like they
have been burned). One can imagine several different treatments
for this disease: treatment A: no action (a control group),
treatment B: careful removal of clearly affected branches, and
treatment C: frequent spraying of the foliage with an
antibiotic in addition to careful removal of clearly affected branches.
One can also imagine several different outcomes from the disease:
outcome 1: tree dies in same year as the disease was noticed,
outcome 2: tree dies 2-4 years after disease was noticed,
outcome 3: tree survives beyond 4 years. A group of
N trees are assorted into one of the treatments
(i.e., every tree falls into exactly one of the following
treatment categories [ A | B | C ] )
and over the next few years the outcome is recorded
(i.e., every tree falls into exactly one of the following
outcome categories [ 1 | 2 | 3 ] ). If we count the number of trees
in a particular treatment/outcome pair (e.g., the number of trees
that received treatment B and lived beyond 4 years: #B3),
we can display the results in a table called a contingency table:

Treatment

Outcome

A

B

C

Row Totals

1

#A1

#B1

#C1

total 1

2

#A2

#B2

#C2

total 2

3

#A3

#B3

#C3

total 3

ColumnTotals

total A

total B

total C

GrandTotal

For example the contingency table

A B C
1 5 3 2 10
2 2 3 4 9
3 0 2 3 5
7 8 9 24

reports that of the 24 trees selected for this experiment, 4 of them
received the full treatment (C) and nevertheless died in 2-4 years (outcome 2).
It looks as if either treatments B or C are better than the
control, and maybe C is a better treatment than B. But how
can we quantify whether treatment actually helps?

The null hypothesis is that the probabilities for each outcome
are independent of the treatment. For example, since overall 10/24 of the
trees died in the first year, if treatment has no effect
one might estimate that 10/24 is the probability
for early death; that, in turn, would suggest that in the control group early death
should have been expected for 7×10/24=2.92 trees whereas 5 were observed.
Following this logic through, under the null hypothesis the expected contingency
table would be:

One can now use chi-square (actually X2) to compare the expected contingency table
to the observed contingency table. Please note a problem: we said chi-square
is suspect if expected values are less than 5; all expected values are too
small here! For what its worth, chi-square reports the null hypothesis
is still OK with this data. (In fact, re-binning: putting outcomes 1&2
together into "tree death" and putting treatments B&C together under
"some action" and using Fisher Exact Test on the resulting 2x2, still
shows a viable null hypothesis.)

Possible solutions to the problem of sparsely populated cells is discussed
in greater detail here.

The previous example is called a 3x3 contingency table; more generally
we have #row x #column contingency tables.

While contingency tables are most commonly analyzed using
X2, there is an exact method (Fisher Exact Test) which
avoids the concerns of small expected values, but which is more difficult
to compute. This server will compute exact p for up to
6x6 contingency tables. Click
here for more information.