Programming Assignment 3 Computer Science 102

Spring 2005

Third draft due: to be announced in class

Introduction

As discussed in draft#1, we are studying an efficient scheme for for
compressing text files called Huffman coding. We again exploit the
fact that not all characters appear with the same frequency in the text, by
encoding rarely used characters with long codes and frequently used ones with
short codes.

Given a set of characters and their corresponding frequencies, an optimal
coding scheme is produced by Huffman coding. It uses a binary tree for
encoding. In Draft #3, however, we use a heap as a priority queue instead of a
linked list. The big Oh behavior for constructing a heap is
O(nlog(n)); whereas, its O(n^2) for a list. Please use the
command line to execute your program and read the two file names, namely,
huff.txt and huffcode.txt. The command line can be obtained
from Run on the Start menu by typing cmd. The two file
names should be used as global variables in your program. Note that this draft
can be done eith with static methods or non-static ones if you wish.

Draft #3

We will use a heap implementation of a priority queue to this program. Note
that here we use a min-heap in this project instead of the max-heap we used in
class. Also, deleteMin() will replace the deleteMax() that
is on the lecture links on the web. Here is the skeleton of the class you will
be using:

where HuffRecord and TreeNode are embedded classes. The
keyword public is optional because it is the default. The class
TreeNode must of course contain a compareTo method and should
also contain a toString() method.
Here are the methods to be altered or added to your class of draft #2:

private static void pairs(String record, HuffRecord[] stream)
where the array stream is generated in the method and record
is the string of letters and frequenecies read from the file.

public static void initialize( HuffRecord[] stream)
assigns the two fields of
stream[], i.e., freq and letter to the corresponding
fields of an instance of TreeNode, let's say aux. It assigns
this to an element of an array of Comparable, x[]. This array
is an instance or global variable which represents the heap.

Method print(int start, int end)
prints the elements of the array x.
In order to do this, you must of course cast x[j] as an instance of
TreeNode and then dereference it before you print it, or
you may use the implicit toString() method

method shift(int m) shifts the elements of x one position, so
that parent =child/2 has meaning.

Thus x[0] becomes
x[1], x[i] becomes x[i+1].

Method build creates a heap by calling shiftUp(j) in a
loop. To facilitate your understanding of the heap process let's call
shiftUp(j) by the name insert instead.

Method reduceIt() calls reduce in a while loop
while an instance variable last is greater than one. last
represents the value of the highest subscript of the heap as the heap is
shrunken,

private static void reduce() is similar to the method of the same
name in draft #2. It

calls DeleteMin twice to obtain the element of the heap with
minimum frequency, i.e., the root. The first call should return a pointer
op1 of type
Comparable. The counter last is then decremented.
The second call should return a pointer op2. These pointers should be
cast to pointers p1 and p2 of type TreeNode.

calls CombineTree with the above two pointers as parameters.

private static void combineTree(TreeNode p1, TreeNode p2) creates a
new instance aux of TreeNode as in draft #2 whose frequency
field is the sum of the frequencies in the objects pointed to by p1
and p2, and assigns aux to x[last]. Then places the
element in its proper place in the heap by calling insert(last).

private static Comparable deleteMin(int n) swaps the top of the heap,
the first element in x[], with the last element in it. It then
reheapifies the heap by calling shiftDown(n-1). This method is defined
in the heap sort done in class.

Each procedure should contain preconditions and postconditions. The data for
this program is the same as for draft #2.

Your main method should:

Read the data from the file and form the heap.

Print the elements of the heap.

Reduce the heap to the final element.

Print the Huffman code for each letter and decode the input string

The output for forming the heap is on the WEB. In the output, X
is the character placed in the letter field for elements produced
by combining two other elements.