Chi-squared test for the relationship between two categorical variables - overview

This page offers structured overviews of one or more selected methods. Add additional methods for comparisons by clicking on the dropdown button in the right-hand column. To practice with a specific method click the button at the bottom row of the table

Chi-squared test for the relationship between two categorical variables

There is no association between the row and column variable
More precise statement:

If there are $I$ independent random samples of size $n_i$ from each of $I$ populations, defined by the independent variable:The distribution of the dependent variable is the same in each of the $I$ populations

If there is one random sample of size $N$ from the total population:The row and column variables are independent

Alternative hypothesis

There is an association between the row and column variableMore precise statement:

If there are $I$ independent random samples of size $n_i$ from each of $I$ populations, defined by the independent variable:The distribution of the dependent variable is not the same in all of the $I$ populations

If there is one random sample of size $N$ from the total population:The row and column variables are dependent

Assumptions

Sample size is large enough for $X^2$ to be approximately chi-squared distributed under the null hypothesis. Rule of thumb:

2 $\times$ 2 table: all four expected cell counts are 5 or more

Larger than 2 $\times$ 2 tables: average of the expected cell counts is 5 or more, smallest expected cell count is 1 or more

There are $I$ independent simple random samples from each of $I$ populations defined by the independent variable, or there is one simple random sample from the total population

Test statistic

$X^2 = \sum{\frac{(\mbox{observed cell count} - \mbox{expected cell count})^2}{\mbox{expected cell count}}}$
where for each cell, the expected cell count = $\dfrac{\mbox{row total} \times \mbox{column total}}{\mbox{total sample size}}$, the observed cell count is the observed sample count in that same cell, and the sum is over all $I \times J$ cells