NAG Toolbox: nag_stat_contingency_table (g01af)

Purpose

nag_stat_contingency_table (g01af) performs the analysis of a two-way r × cr×c contingency table or classification. If r = c = 2r=c=2, and the total number of objects classified is 4040 or fewer, then the probabilities for Fisher's exact test are computed. Otherwise, a test statistic is computed (with Yates' correction when r = c = 2r=c=2), which under the assumption of no association between the classifications has approximately a chi-square distribution with (r − 1) × (c − 1)(r-1)×(c-1) degrees of freedom.

Description

The data consist of the frequencies for the two-way classification, denoted by nijnij, for i = 1,2, … ,mi=1,2,…,m and j = 1,2, … ,nj=1,2,…,n with m,n > 1m,n>1.

A check is made to see whether any row or column of the matrix of frequencies consists entirely of zeros, and if so, the matrix of frequencies is reduced by omitting that row or column. Suppose the final size of the matrix is m1m1 by n1n1 (m1,n1 > 1m1,n1>1), and let

Ri = ∑ j = 1n1nijRi=∑j=1n1nij, the total frequency for the iith row, for i = 1,2, … ,m1i=1,2,…,m1,

Cj = ∑ i = 1m1nijCj=∑i=1m1nij, the total frequency for the jjth column, for j = 1,2, … ,n1j=1,2,…,n1, and

Under the assumption that there is no association between the two classifications, χ2χ2 will have approximately a chi-square distribution with (m1 − 1) × (n1 − 1)(m1-1)×(n1-1) degrees of freedom.

An option exists which allows for further ‘shrinkage’ of the matrix of frequencies in the case where rij < 1rij<1 for the (i,ji,j)th cell. If this is the case, then row ii or column jj will be combined with the adjacent row or column with smaller total. Row ii is selected for combination if Ri × m1 ≤ Cj × n1Ri×m1≤Cj×n1. This ‘shrinking’ process is continued until rij ≥ 1rij≥1 for all cells (i,ji,j).

(ii)

If m1 = n1 = 2m1=n1=2 and T ≤ 40T≤40, the probabilities to enable Fisher's exact test to be made are computed.

The matrix of frequencies may be rearranged so that R1R1 is the smallest marginal (i.e., column and row) total, and C2 ≥ C1C2≥C1. Under the assumption of no association between the classifications, the probability of obtaining rr entries in cell (1,1)(1,1) is computed where

References

Parameters

Compulsory Input Parameters

ldnob, the first dimension of the array, must satisfy the constraint
ldnob ≥ mldnob≥m.

The elements
nobs(i,j)nobsij, for i = 1,2, … ,mi=1,2,…,m and j = 1,2, … ,nj=1,2,…,n, must contain the frequencies for the two-way classification.
The (m + 1)(m+1)th row and the (n + 1)(n+1)th column of nobs need not be set.

The elements pred(i,j)predij, where i = 1,2, … ,m1i=1,2,…,m1 and j = 1,2, … ,n1j=1,2,…,n1 contain the expected frequencies, rijrij corresponding to the observed frequencies nobs(i,j)nobsij, except in the case when Fisher's exact test for a 2 × 22×2 classification is to be used, when pred is not used. No other elements are utilized.

4:
chis – double scalar

The value of the test statistic, χ2χ2, except when Fisher's exact test for a 2 × 22×2 classification is used in which case it is unspecified.

5:
p(2121) – double array

The first num elements contain the probabilities associated with the various possible frequency tables,
PrPr, for r = 0,1, … ,R1r=0,1,…,R1, the remainder are unspecified.

6:
npos – int64int32nag_int scalar

p(npos)pnpos holds the probability associated with the given table of frequencies.

7:
ndf – int64int32nag_int scalar

The value of ndf gives the number of degrees of freedom for the chi-square distribution, (m1 − 1) × (n1 − 1)(m1-1)×(n1-1); when Fisher's exact test is used ndf = 1ndf=1.

8:
m1 – int64int32nag_int scalar

The number of rows of the two-way classification, after any ‘shrinkage’, m1m1.

9:
n1 – int64int32nag_int scalar

The number of columns of the two-way classification, after any ‘shrinkage’, n1n1.

Accuracy

The method used is believed to be stable.

Further Comments

The time taken by nag_stat_contingency_table (g01af) will increase with m and n, except when Fisher's exact test is to be used, in which case it increases with size of the marginal and total frequencies.

If, on exit, num > 0num>0, or alternatively ndf is 11 and nobs(m,n) ≤ 40nobsmn≤40, the probabilities for use in Fisher's exact test for a 2 × 22×2 classification will be calculated, and not the test statistic with approximately a chi-square distribution.