Error detecting and error correcting codes

Richard Hamming's classic paper Error detecting and error correcting codes appeared in April 1950 in the Bell System Technical Journal. Today we live in a digital age and it is well known that digital information is a sequence of 0s and 1s. When Hamming was thinking about digital information in the late 1940s, it was understood by only a few experts.

In performing calculations, computers transmit information internally as sequences of 0s and 1s. Digital information transmitted over distances is of the same form. Although at first glance it seems restrictive to use messages consisting only of 0s and 1s, in fact it places no restriction on what is sent. For example, numbers can be sent in binary, so that base 10 numbers 1, 2, 3, 4, 5, 6, 7, 8 are sent as 1, 10, 11, 100, 101, 110, 111, 1000. Letters can be sent as short binary codes.

It was an event in 1947 that prompted Hamming to undertake his most famous piece of work. One Friday, while working for Bell Laboratories, he set their pre-computer calculating machines to solving a complex problem and expected the result to be waiting for him when he began work on the following Monday. But when he arrived on Monday, he found that an error had occurred early on in the calculations and the relay-based calculators had been unable to proceed.

The machines had a mechanism known as a parity check for detecting errors. To send data, the sequence of 0s and 1s is broken into blocks of a certain fixed length, let us say blocks of length 5. A parity check digit is added to the left of the original 5-tuple, making the new block a 6-tuple (i.e. a sequence of 6 entries, each either 0 or 1). The parity check digit is chosen so that the sum of the digits in the 6-tuple is always even. Now if one of the elements of the 6-tuple is received incorrectly, the error will be known because the parity check digit is wrong. For example, if we wanted to send the 5-tuple 1,1,0,0,1, then we would make the parity check digit 1 so that the new 6-tuple, with the parity check digit in the first place, would be 1,1,1,0,0,1. If we send that 1,1,1,0,0,1 but a single transmission error occurs and, say, 1,1,0,0,0,1 is received, then the parity check is odd and we therefore know that an error has occurred. However, there is no way of knowing which digit is wrong, and what was originally sent. Hamming thought, "If the computer can tell when an error has occurred, surely there is a way of telling where the error is so that the computer can correct the error itself." He set himself the task of solving this problem.

Think about sending a single digit, either 0 or 1, with its parity check digit to the left. There are two possible valid words: 0,0 and 1,1. Call the valid words code words. Then if 0,1 or 1,0 is received, we know from the parity check that an error has occurred. In other words, 0,1 and 1,0 are not code words. However, if 0,1 is received, there is no way of knowing whether 0,0 or 1,1 was originally sent.

Let us define the Hamming distance between two words (of the same length) as the number of places in which they differ. (Hamming, of course, doesn't call this the Hamming distance in his paper, simply distance.) If a single error occurs in sending a code word, then the received word will be at Hamming distance 1 from the code word sent. For example, if 1,1 was sent and 0,1 received then the two words are at Hamming distance 1. But the code word 0,0 is also at Hamming distance 1 from 0,1 so we don't have enough information to correct the error.

Let us now add a further parity check digit, again to the left. We then have the code word 1,1,1 where the middle 1 gives even parity with the original right-most 1 and the left-most 1 also gives even parity with the original right-most 1. Putting in the same two parity checks for sending 0 gives the code word 0,0,0. Now suppose a single error occurs in sending a code word and 1,0,1 is received. Since the parity check for 0,1 fails, the error occurred in one of these entries. Since the parity check for the first and third entries is valid, it must be the second entry that is in error, and we can correct it. Thought of another way, 1,0,1 is at Hamming distance 1 from 1,1,1 and Hamming distance 2 from 0,0,0; if a single error occurred then 1,0,1 must have originally been 1,1,1.

This makes it appear that we need to send a lot of additional information to correct a single error. However, Hamming showed in his classic paper Error detecting and error correcting codes, that for longer blocks correct a single error with a relatively small number of parity check digits compared to the length of the original block. For example, a block of length 4 can be corrected for any single error with the addition of only 3 parity check digits. The original block of length 4 with 3 added parity checks gives code words of length 7, say x,y,a,z,b,c,d, where a,b,c,d is the word to be sent. First the digit x is chosen to give even parity to the sum of the entries in positions 1,3,5,7, i.e., x is chosen so that x+a+b+d is even. Next y is chosen to give even parity to the sum of the entries in positions 2,3,6,7, i.e., y is chosen so that y+a+c+d is even. Finally z is chosen to give even parity to the sum of the entries in positions 4,5,6,7, i.e., z is chosen so that z+b+c+d is even. We now give the example that Hamming gave in his paper. Suppose we want to send 1,1,0,0. We construct the code word x,y,1,z,1,0,0 as above. This gives x = 0, y = 1, z = 1 and the code word is 0,1,1,1,1,0,0. Suppose we send this word but, because a single error occurs, we receive 0,1,1,1,0,0,0. On receiving this word we make the parity checks. The check for positions 1,3,5,7 fails, so we write down 1. The check for positions 2,3,6,7 succeeds, so we write down 0 to the left of the 1 to get 01. The check for positions 4,5,6,7 fails, so we write down 1 to the left of the 01 to get 101. Since 101 is the binary representation of 5, we know that the error occurred in the fifth place and the correct original sequence was 0,1,1,1,1,0,0.

We have given a specific example, but we haven't explained in general how to determine which positions each of the parity checks covers. The first checked positions 1,3,5,7,9, ... (in binary notation 1, 11, 101, 111, 1001, ...) namely the positions with the final binary digit equal to 1. Similarly the second parity check covered positions 2,3,6,7,10, ... (in binary notation 10, 11, 110, 111, 1010, ...) namely the positions with the second last binary digit equal to 1. Similarly the third parity check covers positions 4,5,6,7,12, ... (in binary notation 100, 101, 110, 111, 1100, ...) that is those positions with the third last binary digit equal to 1. The pattern should now be clear for any size code word.

We have described Hamming’s method to correct a single error in a block of transmitted data. In his 1950 paper he also presented a method which corrects a single error while simultaneously detecting when two errors occur. He further showed that, in a mathematical sense, these error correcting codes are the best possible codes; there are none shorter.

ACM (www.acm.org) is widely recognized as the premier organization for computing professionals, delivering a broad array of resources that advance the computing and IT disciplines, enable professional development, and promote policies and research that benefit society.