Clustering the rows and columns of a contingency table

Abstract

A number of ways of investigating heterogeneity in a two-way contingency table are reviewed. In particular, we consider chi-square decompositions of the Pearson chi-square statistic with respect to the nodes of a hierarchical clustering of the rows and/or the columns of the table. A cut-off point which indicates “significant clustering” may be defined on the binary trees associated with the respective row and column cluster analyses. This approach provides a simple graphical procedure which is useful in interpreting a significant chi-square statistic of a contingency table.

CLEVELAND, W.S., and RELLES, D.A. (1975), “Clustering by Identification with Special Application to Two-way Tables of Counts,”Journal of the American Statistical Association, 70, 626–630.Google Scholar