Tag Archives: EM

In the classic paper on the EM algorithm, the extensive example section begins with a multinomial modeling example that is theoretically very similar to the warm-up problem on 197 animals:

We can think of the complete data as an matrix whose element is unity if the -th unit belongs in the -th of possible cells, and is zero otherwise. The -th row of contains zeros and one unity, but if the -th unit has incomplete data, some of the indicators in the -th row of are observed to be zero, while the others are missing and we know only that one of them must be unity. The E-step then assigns to the missing indicators fractions that sum to unity within each unit, the assigned values being expectations given the current estimate of . The M-step then becomes the usual estimation of from the observed and assigned values of the indicators summed over the units.

In practice, it is convenient to collect together those units with the same pattern of missing indicators, since the filled in fractional counts will be the same for each; hence one may think of the procedure as filling in estimated counts for each of the missing cells within each group of units having the same pattern of missing data.

Rao (1965, pp. 368-369) presents data in which 197 animals are distributed multinomially into four categories, so that the observed data consist of

A genetic model for the population specifies cell probabilities for some with .

Solving this analytically sounds very much like a 1960’s pastime to me (answer: , and a modern computer with PyMC can make short work of numerically approximating this maximum likelihood estimation. It takes no more than this:

But the point is to warm-up, not to innovate. The EM way is to introduce a latent variable and set , and work with a multinomial distribution for that induces the multinomial distribution above on . The art is determining a good latent variable, mapping, and distribution, and “good” is meant in terms of the computation it yields:

EM (which stands of Expectation-Maximization) is something I have heard about a lot, and, as an algorithm, it is even something I understood when I read a blog post relating it to a clustering algorithm that I now cannot find (maybe on the Geomblog). The arguments from the EM to DA paper, which I read recently, makes me think that there is something more than the algorithm going on here. In this later paper, Tanner and Wong identify the importance of Section 4 of the EM paper, which provides a collection of example applications of their EM algorithm. Is there something about this way of thinking that is even more important than the algorithm itself?