Chi-square with Excel

For contingency tables, we start with the heart attack data. If you have not
worked with this data before you can find a description here. This is a very large data set and so is
provided as zip files. (You may need a program such as winzip to unzip them). Available
are plain
text (with tabs separating entries), Excel
versions of the data. If possible, download and unzip the Minitab version of the
heart attack data and open it in your version of Minitab. (It may be too big for
student versions.) We would like to see if the mortality rate is different
between men and women.

To get 2x2 contingency table Select Data > Pivote Table
and Pivote charts next select two columns of Sex and Died, click on next
and then choose Layout, then drag the variable Sex and place
into the ROWS, similarly drag the variable Died and place into the
COLUMNS, also drag any of the variables Sex or Died and place into
DATA, then double click and select count. Finally click on finish. You
will get following table.

DIED

SEX

0

1

Grand Total

F

4298

767

5065

M

7136

643

7779

Grand Total

11434

1410

12844

Now we will find expected value from the above table. Formula for expected
value = (Row Total)*(Column Total)/Grand Total. Let's first find the expected
value for first cell. For first cell we have row total is 5065 and Column
total is 11434 and grand total is 12844. Therefore Expected value
= 5065*11434/12844 = 4508.97 like this find for all cell. Then we get the matrix
as follows:

Select Insert > Function> StatisticalCHITEST then in actual range select the 2x2 observed value matrix and in
expected range select the 2x2 expected value matrix, finally click Ok. We
get the p-value for the Chi-square test = 3.75555E-34 = 3.75*10-34.
Here p-value is less than 0.05 (level of Significance). Therefore we will reject
H0 at 5% of level of significance.

With 12,844 observations, getting the table is a lot more work than
computing chi-square, and it is best to let the computer do it. If you have an
existing table, then no need to go pivot table just find the expected value and
follow the steps as given above. For example, here are some data from the
University of Texas Southwestern Medical Center reported in De Veaux, Velleman
and Bock, Stats.: Data and Models 2nd ed., 2008, Addison Wesley, Boston.
(It's the last example in Chapter 26, p.645.) The disease hepatitis C can be
transmitted through needle pricks including those involved in tattoos. The goal
here was to compare infection rates between people with differing tattoo status.
A summary of their data is

Hepatitus C

No Hepatitus C

Tattoo from parlor

17

35

Tattoo from elsewhere

8

53

No tattoo

22

491

We get expected value matrix is as follow:

Hepatitus C

No Hepatitus C

Tattoo from parlor

3.90

48.10

Tattoo from elsewhere

4.58

56.42

No tattoo

38.52

474.48

When you go this route you have no control over what gets printed and have to
put up with this congested table. In this case, the information in that crowded
table is useful. A rule of thumb is that for the approximations in using the
continuous chi-squared distribution for discrete counts to work well we should
have expected counts above 5 in each cell. Here we have two violations out of
six cells. One reason we check this is that because expected counts go into the
denominator in computing the sample chi-squared, they could inflate the result.
One common after-the-fact remedy is see if there are some rows in the table we
can combine is a rational way. In this case we could concede that we do not have
enough data to answer questions about where people got their tattoos and pool
the two sources together to get a new table with these entries.

25

88

22

491

The analysis is left as an exercise but the conclusion is that the hepatitis
rates are quite different for the tattoo vs. no tattoo groups.