Abstracts

The past decade has seen steady and significant progress in efficient
algorithms for learning mixtures of Gaussians, hidden Markov models,
and various other probabilistic models with latent variables.
In contrast with typical local search methods (eg, EM), these algorithms
have strong guarantees about recovering (globally) good models.

The common theme of all this work is that it heavily exploits linear
projections: random projections, principal component analysis, and
canonical correlation analysis. In this tutorial, I will give a broad
overview of these results: algorithms, theory, experimental findings,
and open problems. No particular background is required, other than
a familiarity with common latent variable models.

Manfred JaegerLearning and Reasoning With Incomplete Data: Foundations and Algorithms

The EM algorithm is one of the cornerstones of machine learning. It is well-known that the use
of this algorithm needs to be justified by the Missing at Random (MAR) assumption. But what exactly
is the MAR assumption, how do we find out whether it can realistically be made, and what should
we do if we can't? In this tutorial I will give an overview of recent results that have improved
our understanding of these issues. The tutorial consists of three parts:

Part 1 introduces the missing at random assumption, and the crucial role it plays in learning from
incomplete data, and in making probabilistic inferences by conditioning. Using the more general
concept of Coarsened at Random (CAR) it is shown that this modeling assumption comes in subtly different
versions, and that the different versions can have significantly different implications for statistical
learning and probabilistic reasoning.

Part 2 is concerned with the question of whether (and how) one can construct canonical generative
models for data coarsening processes that satisfy the CAR assumption. Such models are interesting,
because they can help to decide whether the CAR (or MAR) assumption can realistically be made for
a given dataset at hand by investigating whether the process that caused the current data to be
incomplete can be matched against one of the canonical models. Recent results have provided a fairly
comprehensive picture of the existence and nature of canonical CAR models.

Part 3 is devoted to alternative learning methods when the MAR/CAR assumption appears unreasonable.
It is shown that an algorithm similar to the EM algorithm (and in special cases equivalent to the
EM algorithm) can be used for incomplete data that is not CAR. An implementation of this algorithm
(christened AI&M by the speaker, but sometimes also less fancifully called Alternating Kullback-
Leibler Minimisation) for parameter learning in Bayesian networks shows that for non-CAR data better
accuracy is obtained than with the EM algorithm. Furthermore, likelihood ratios of AI&M and EM computed
parameters can be used for statistical tests of the CAR assumption relative to a given parametric
model.

Linear structural equation models (Linear SEM) are widely applied in many empirical studies including social sciences, neuroinformatics and bioinformatics. Estimation of linear SEMs for continuous variables typically uses covariance structure of data alone and poses serious identifiability problems so that many important models including path analysis models are indistinguishable with no prior knowledge on the structures.
A linear acyclic model which is a special case of path analysis models is typically used to learn data generating processes. Covariance information alone is not sufficient to uniquely estimate such a linear acyclic model and in most cases cannot identify the full structure (path coefficients and path directions) of the model.
Bentler (1983, Psychometrika) proposed that non-Gaussian structures of data could be useful to overcome such identifiability problems of covariance-based estimation of SEMs. Recently it was shown that use of non-Gaussianity allows the full structure of a linear acyclic model to be identified without pre-specifying any path directions between the variables (Shimizu et al., 2006, JMLR). The new method is based on a fairly recent statistical technique called independent component analysis (Hyvarinen et al., 2001, Wiley). The non-Gaussian framework has been extended in various directions for learning wider variety of SEMs. In this tutorial, we will first present an overview of non-Gaussian methods for learning SEMs and then go to some recent advances.