Visualizing Huffman Coding Trees

What would you do if you wanted to transfer a message, say one written in English but you only had a limited set of characters. Suppose these characters are 0 and 1. The only way of doing this is by writing some type of procedure to transfer from our 26 letter alphabet to the 0-1 binary alphabet. There are several ways of developing these encoding functions, but we will focus on those that attempt to translate each individual character into a sequence of 0s and 1s. One of the more popular such codes today is the ASCII code, which maps each character to a binary string (of 0s and 1s) of length 8. For example, here is the ASCII code for the upper and lower case alphanumeric characters.

ASCII

English

01100001

a

01100010

b

01100011

c

01100100

d

01100101

e

01100110

f

01100111

g

01101000

h

01101001

i

01101010

j

01101011

k

01101100

l

01101101

m

01101110

n

01101111

o

01110000

p

01110001

q

01110010

r

01110011

s

01110100

t

01110101

u

01110110

v

01110111

w

01111000

x

01111001

y

01111010

z

What you notice from this is that each of these encodings beings with “011”, which amounts to a lot of wasted space. ASCII code doesn’t care about this because the fixed length of each binary string allows for easy lookup of particular characters (i,e, you can start almost anywhere in the string with your decomposition as long as you start at a multiple of 8).

But what if we were interested in minimizing the total bits used by the encoded string? This is where the Huffman coding algorithm gains its fame. Unlike the ASCII coding scheme, Huffman codes assign shorter codes to the more frequently occurring characters in your string. Huffman was able to prove this tactic would guarantee the shortest possible encoding.

The Huffman Coding procedure operates as follows: 1. Input string to be encoded -> Input 2. For each character in the input string, calculate the frequency of that character (i.e. the number of times it occurs in the input) 3. Sort the array of characters in the input by their decreasing frequencies 4. Place the array of characters into the queue with each one represented by a node. 5. While there are two or more nodes remaining in the queue. 6. Remove the nodes representing the two characters with the lowest frequency from the queue. 7. Create a node which points to the two nodes just removed from the queue (node -> left points to one node; node -> right points to the other). 8. Insert this new node into the queue, with the frequency equal to the sum of the frequencies of the nodes it points to. 9. If the length of the queue is greater than 1, then goto 5.