Statistical Tests of Independence

This package contains Matlab implementations
of three statistical hypothesis tests for independence:
a kernel test, as described
in GreEtAl08a; and
tests based on the L1 and log-likelihood, as described in
GreEtAl08b,GreEtAl10.

We propose to test whether random variables X and Y are
independent based on a sample of observed pairs (x_i,y_i).
The software deals with three test statistics.
The kernel test uses the Hilbert-Schmidt norm of the covariance
operator between RKHS mappings of X and Y: this is called the Hilbert-Schmidt
independence Criterion
(HSIC). The population HSIC is zero at independence, so the sample is
unlikely to be independent when the empirical HSIC is large.
An
intuitive explanation of HSIC and the associated test may be found in these talk slides.
The second test uses the L1 distance between the joint distribution and
the product of the marginals as its test statistic (computed on
a partitioning of the space), and the third test uses the mutual information.

The test
software returns both the test statistic and a threshold, where the latter is a
user-specified quantile of the empirical HSIC distribution at
independence. When the statistic exceeds this threshold, we reject the
independence hypothesis.
Three strategies are used to calculate the test threshold:

Moment matching to a Gamma distribution (HSIC)
fits a two-prameter Gamma distribution to the first two moments.
Requires the Matlab
statistics toolbox.

Distribution-free consistent test (L1, log likelihood)
uses a distribution-free test threshold, which is not computed
from the sample.

Shuffling (HSIC, L1, log likelihood) uses bootstrap resampling
on the aggregated
data to obtain a test threshold. Slower than the moment matching
and distribution-free approaches,
but can be more accurate in practice for small sample sizes.