Taught By

Tim Roughgarden

Transcript

To make sure that Huffman's greedy algorithm is clear, let's go through a slightly larger, more complicated example. So let's work with a six character alphabet. Let's call the letters A, B, C, D, E, F, and let's assume that we're given the weights 3, 2, 6 and 8, 2, 6 for these six characters. Remember, this problem is well-defined even if the weights don't add up to 1. If you prefer working with actual probabilities, feel free to divide these six numbers by 27. In the first step of Huffman's greedy algorithm, we find the letters that have the smallest weights, the smallest frequencies. So in this example, that would be the letters B and E, both of them have weight 2. Now what we do is we merge these two letters into a single meta letter, in effect committing right now to having B and E be siblings in the final tree. After this merger, our alphabet is down to five symbols, the symbols B and E being replaced by the merged symbol BE. And the weight of BE is the sum of the weights of B and E, namely, 4. We can imagine our tree slowly taking shape through these iterations. So after step 1, we know that B and E are going to be siblings and we know that just A, C, D, and F are going to be leaves, that's all we know thus far. In the next iteration, we again look for the two symbols that have the smallest weight. And here the smallest weight symbol is A. It has weight 3. And the runner-up is the merged symbol B sub E. Its combined weight is 4, and that's second overall for these five symbols. So in this step, we merge A with B E. Now our alphabet is down to four symbols, the merged symbol, A, B, E, which has cumulative weight 7, and then the original symbols, C, D, and F, which have their original weights, 6, 8, and 6. As far as our tree, we've now committed to the symbol A appearing as an uncle of the siblings B and E. And again, C, D, and F, we just know they're leaves somewhere in the final tree. In step three, we're going to again pick the two symbols that have the smallest weights. In this case, the two symbols with the smallest weight are C and F, each of weight 6. In our new alphabet, we still have our symbol ABE, it still has weight 7. We still have the symbol D, it still has weight 8. But now we have a new merged symbol CF, and its new weight is 12. As far as our tree, in addition to the information we already knew, we're now committing to having C and F be siblings in the final tree. In step four, we merge the two symbols with the smallest weight. So that would be ABE with its weight of seven and D with its weight of eight. So this leaves us with only two symbols, ABDE and CF. And now we know what both of these subtrees of the root of the final tree are going to look like. Now that we're down to two symbols, the only thing we can do is fuse these two symbols into one, fuse these two subtrees into a single one by uniting them under a common root. That gives us the following final output of Huffman's algorithm. What prefix-free code does this tree correspond to? Well, as usual, let's label all of the left branches with zero and all of the right branches with ones. And now, as usual, the encoding of a character is just the symbols of zeros and ones that you see when you traverse a path from the root to that leaf. So for example, A will be encoded with 000, B with 0010, C with 10, D with 01, E with 0011, and F with 11.

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.