Conceptually, the expectation maximization algorithm is simple: you take a guess at the values you want to find, then run your calculations as though they were correct. Then, use those results to improve your guess.

That article does a good job of explaining the algorithm in more detail, but I couldn’t make sense of the example. So I wrote some code to help me figure out.

The example is like this: You have two coins. They’re not exactly balanced, and you want to know how heavily weighted they are. The only information you have, though, are five sequences of ten flips each. And, you don’t even know which coin is responsible for which sequence!

Okay, let’s write some code. I’ll be working in Python for this example.

Now what? One easy way to a solution would be to figure out which sequences were most likely to come from which coins, then credit all the heads and tails from those sequences to those coins. Using that method, rolls 1,2, and 4 are assigned to coin A, and rolls 0 and 3 are assigned to B. 24 of the 30 flips assigned to coin A are heads, and 9 of 20 flips for coin B. So with that strategy, our new value for theta becomes { “A”:.8, “B”:.45}.

But we can do better. Rather than assign an entire sequence to a single coin, we’ll use the calculated probabilities to weight the sequence. So – and this is a tricky bit – we have to compare the calculated probabilities to each other rather than looking at them as absolutes. So if one coin has a probability of 20% and the other has a probability of 30%, we’ll assign them weights of .4 and .6, respectively. Here’s how:

The variable aChance holds the weighted probability of the sequence coming from the coin. Next, we’ll assign that percentage of the heads in the sequence to that coin (We’ll keep a couple of dictionaries to hold the information):