Appropriate
for CS2 or a later
course. A simplified version could be used in CS1.

Difficulty

This
is an intermediate assignment, taking 1 or 2 weeks for a CS2
student.

Strengths

Integrates
disparate topics from programming and discrete math into a cool
application.
Solves a classical problem in Computer Science.

Weaknesses

Like
so many other topics in CS, there are some very inelegant ways to
approach parts of the problem. For example, if encoding
Integers, Java's Integer
class contains a method: static
String
toBinaryString(int i), which
students can use to pull apart the Integer into its binary
representation rather than use the more appropriate bitwise operators.

Dependencies

Requires
an understanding of matrix multiplication and binary representations of
data, including bitwise operators and binary I/O.

Variants

There
are three steps to implementing Hamming codes and you can make
each easier or more difficult.

Make
the Data Bits Accessible
Easy - Give the students an array of bits.
Harder - Give the students an array of Integers and make them pull out
the bits of each using bitwise operators.
Most Difficult - Give the students an arbitrary file and make them read
it byte by byte pulling the bits out of each byte.

Calculate
the Parity Bits
Brute Force, but Easy to Understand - Add the appropriate data bits
together to
calculate each parity bit.
Easy, but Harder to Understand - Use matrix multiplication (mod 2) to
generate the code block.
Most Difficult - Have the students develop the matrices needed on their
own.

Storing
the Results
Easy - Have the students write the code bits to the screen.
Harder - Append a leading 0 to each code block and have the students
write the byte to a file.
Most Difficult - Have the students develop a BitStream class and write
the
results to a file using it.

Review of Hamming Codes

(The following
discussion borrows, in part, from
Wikipedia's Hamming code page.) Hamming codes
are error-correcting codes.They are named after their
inventor, Richard
Hamming. Hamming was working for Bell labs in the 1940s.He
grew frustrated with how often he had to
restart his programs because a read error occurred.He
developed Hamming codes as a way of
detecting and correcting errors.Hamming
codes can detect and correct single-bit errors or can detect (but not
correct)
up to two simultaneous bit errors.

Hamming
codes require O(lg(n))
parity
bits for n
data bits.Each
parity bit checks some (but not all) of
the data bits.If an error occurs in a
data bit, all the parity bits that checked that bit will show the
error,
allowing us to uniquely determine where the error occurred. The
parity bit at position 2k-1
checks
bits in positions having bit k (1 based) set in their binary
representation. The
following table summarizes the technique.

Bit
position

1

2

3

4

5

6

7

...

Encoded
data bits

p1

p2

d1

p3

d2

d3

d4

Parity
bit
coverage

p1

X

X

X

X

p2

X

X

X

X

p3

X

X

X

X

Shown are 7
encoded bits (3 parity, 4 data) but the pattern
continues indefinitely. The key thing about Hamming Codes that can
easily be
seen from visual inspection is that any given bit has a unique parity
bit
coverage. For example, the only data bit covered by p1 and p3 (and no
other
bits) is bit 5 (d2). It is this unique bit coverage that lets a Hamming
Code
correct any single bit error.

It
also shows how a two-bit error can be detected but not corrected.
For example, if bits 1 (p1) and 2 (p2) were flipped then this would be
confused
with bit 3 (d1) being flipped since the parity bit coverage of bit 3 is
bits 1 and
2.

Hamming
codes can be described in terms of the total number of bits
per block and the number of data bits.Hamming(7,4)
encodes 4 data bits into 7 bits by adding three parity bits, as
the table above.

Example

Suppose we
want to use Hamming (7,4) to encode the byte 1011 0001.
The first thing we will do is split the byte into two Hamming code
data blocks, 1011 and 0001.
We expand the first block on the left to 7 bits: _ _ 1 _ 0 1 1.
The first missing bit (bit 1) is 0, because adding bits 3, 5 and 7
gives an even number (2).
The second missing bit (bit 2) is 1, because adding bits 3, 6 and 7
gives an odd number (3).
The last missing bit (bit 4) is 0, because adding bits 5, 6 and 7
gives an even number (2).
This means our 7 bit block is: 0 1
1 0
0 1 1

We expand the
second block to 7 bits using similar logic, giving: 1 1
0 1
0 0 1

7
bits do not a byte make, so we can add a leading 0 to each code block.

Correcting Errors

Suppose
we send our first code block to a file, or out
across the Internet, and accidentally bit 5 gets flipped,
giving: 0 1
1 01
1 1.
We check our parity bits and find that p1 and p3 (bits 1 and 4) show an
error has occurred.
1 + 4 = 5, so we know bit 5 is incorrect and we change it back to 0.

Commentary

Hamming codes are not new. They have been around since the
1950s. In the 1960s and 70s, CS curricula often included a
Files course where it was common to cover topics like error correcting
codes. Modern hardware is much more reliable and error
correcting
codes are no longer considered core CS
curriculum. Students find this assignment engaging because:

CS Students still like to
know how things work. Most
students find a single parity bit easy to understand. The
additional parity bits are something many of them haven't encountered
before, but they can understand how they work quite easily.

The assignment integrates
matrix multiplication with binary
representations of data, two topics typically covered early in a CS
curriculum, but often not with an associated programming project.

CS students find alternative
data representations cool.
This assignment
is in the same spirit as Huffman Trees, LZW compression and RSA
encryption. It stretches students' brains into different ways
of
thinking about encoding data and they like it.