I want to perform a Pearson's $\chi^2$ test to analyse contingency tables; but because I have small numbers, it is recommended to perform instead what is called a Fisher's Exact Test.

This requires generating all integer matrices with the same column and row totals as the one given, and compute and sum all p-values from the corresponding distribution which are lower than the one from the data.

Apparently R offers that, but couldn't find it in Mathematica, and after extensive research couldn't find an implementation around, so I did my own.

The examples in the links are with 2x2 matrices, but I did a n x m implementation and, at least for the MathWorld example, numbers match.

I have one question: The code I wrote uses Reduce; although it seemed to me generating all matrices was more a combinatorial problem. I pondered using FrobeniusSolve, but still seemed far from what's needed. Am I missing something or is Reduce the way to go?

The essential part of the code, which I made available in github here, is that for a matrix like

subject to the constrains $ x_{1,1}\geq 0$, $x_{1,2}\geq 0$, $x_{1,3}\geq 0$, $x_{2,1}\geq 0$, $x_{2,2}\geq 0$, $ x_{2,3}\geq 0 $ and feeds this into Reduce to solve this system over the Integers. Reduce returns all the solutions, which is what we need to compute Fisher's exact p-value.

Note: I just found this advice on how to use github better for Mathematica projects. For the time being, I leave it as-is. Hope easy to use and test.

You can test the above mentioned code like

FisherExact[{{1, 0, 2}, {0, 0, 2}, {2, 1, 0}, {0, 2, 1}}]

It has some debugging via Print which shows all the generated matrices and their p-value. The last part (use of Select) to process all found matrices didn't seem very Mathematica to me, but it was late and I was tired - feedback is welcome.

I would give my tick to the answer with more votes after a couple of days if anyone bothers to write me two lines :)

1 Answer
1

Maybe you are willing to consider a Bayesian approach to this perennial problem. Beware though: Bayesians have no random variables, no p-values, no null hypotheses, etc. They have probabilities, or ratios thereof.

The (out of print) book "Rational Descriptions, Decisions and Designs" by Miron Tribus (1969!) has an excellent chapter on contingency tables. From this book I have copied the solutions below. His solutions are exact and work for small counts as well as non-square tables.
He considers two mutially exclusive hypotheses: "the rows and columns are independent" vs "the rows and columns are dependent", under a variety of different types of knowledge.

Here I give only two cases:

-- Knowledge type 1A, with no specific prior knowledge on the (in-)dependence and no controls,

-- Knowledge type 1B, also with no specific prior knowledge but with a control on the counts of the rows (see examples below).

Tribus computes the "evidence in favor of the hypothesis of independence of rows and columns" for these types. (The references in the code are to chapters and pages in his book.)

Thanks a lot @romke-bontekoe. This is for someone who expects/needs p-values as practiced usually in medicine research, i.e. from a frequentist point of view. Do you know any support of this Bayesian view in medicine/genetics? Interestingly enough, your examples are suitable for Pearson's $\chi^2$, although the test in this context is whether one has an expected matrix having the same row and column sums as the data or not. In a Bayesian view, prior knowledge of both, row and column sums, right? Again, thanks, very interesting!
–
cayaJan 31 '14 at 18:03

Mathematica is a registered trademark of Wolfram Research, Inc. While the mark is used herein with the limited permission of Wolfram Research, Stack Exchange and this site disclaim all affiliation therewith.