Writing a priority queue in C# (part 3)

published:
Tue, 17-Feb-2004
|
updated: Tue, 20-Apr-2004

In the previous installments of this occasional series (
here's part 1 and
here's part 2), we'd looked at
how to write a priority queue in C#, such that it followed the
standards set by the System.Collections namespace.

At the end of the last installment we ended up with a
PriorityQueue class that used the heap algorithm, that
stored object instances that, used integers as the
priority, and that implemented ICollection (so that it
can be viewed as a .NET collection).

Before we move on, the code I published last time has a subtle bug.
Not that the priority queue doesn't enqueue and dequeue items
properly, you understand, but that it likes to keep a hold of objects
well beyond their possible deaths.

To understand what the bug is, we need to quickly review some basic
.NET functionality. In .NET you don't have to free any of the objects
you create since the .NET Common Language Runtime includes a garbage
collector that doe sit for you. This garbage collector periodically
marks all objects that are reachable and sweeps up the remainder (the
unreachable ones) to reclaim their heap memory. (There's a little more
to it than that, of course, but this will suffice for now.)

So, what are unreachable objects? Well, when the garbage collector
(let's call it GC from now on) runs, it works out which objects have
active references. It looks in the method that was currently executing
(so all its local variables will be marked as reachable), as well as
the chain of methods that called the current one. For some objects,
there will be references to other objects within them, so those also
get marked as reachable. The GC, in essence, creates a graph
of objects (a graph is a data structure consisting of vertices —
the reachable objects, in our case — and edges — the links
between the objects). After this analysis is complete, the GC knows
which objects are reachable from the currently executing code, and
therefore which objects are not. It then sweeps up (or compacts) the
unreachable memory.

So, let's go back to our priority queue and especially the
Dequeue() method.

What's happening here? Well, we save the object at the top of the heap
(the zeroth item), decrement the count (there's now one less item in
the heap), and we trickle down the item at the last index of the heap.

Actually, the English language is masking the problem. Suppose we had
10 items in the heap to begin with, numbered 0 to 9. We make a copy of
object 0, decrement the count to 9 (so that the items are numbered
from 0 to 8), and then trickle down the item at position 9. This does
not cause an out-of-bounds exception, by the way; the array onto which
we're mapping the heap has many more items available.

Anyway, after the trickle down operation, the old items from 1 to 8,
together with the old item 9, will have been rearranged in some way to
form a new ordering of the nine items. Any reference to the old item 0
will have disappeared in the process (luckily we have a copy). However
— and here's the bug — there will still be a reference to
the old item 9 at index 9 in the array.

In other words, if we were to enqueue ten items, dequeue them all, and
abandon these dequeued references, the ten objects will not die, at
least until the priority queue is allowed to die.

Here we make use of a new method Clear() of the
HeapEntry struct that resets its item reference and
priority.

Now that we've cleared up this anomaly, let's move on to some extra
functionality.

We've so far agreed to make the priority an integer value: higher
values have higher priority. I can imagine other types of priority as
well, perhaps integers where lower values have higher priority,
perhaps floating point values, perhaps a full-blown priority class. We
should change our priority queue so that the priorities it uses are
not assumed to be simple ascending integers.

The easiest way to do this is make the priority an
IComparable instance instead (that is, an instance of a
class that implements IComparable.CompareTo()). As it
turns out, apart from the various method signatures that accept a
priority, we only have to change two lines of code, where we compared
the integer values.

Nothing too surprising here, as I've already said. In essence the
priority is an IComparable instance, and the comparisons
between two priorities are done with CompareTo().

The next thing we should consider for our PriorityQueue
class is the serialization of an instance. On the one hand this is
really simple (just add the Serializable attribute to the
PriorityQueue and the HeapEntry classes),
but on the other hand the simple answer is very wasteful, at least in
terms of space. Let's see why.

The PriorityQueue class contains an array of
HeapEntry items. This array is deliberately created too
large so that we have room to add new items to the priority queue
without constantly growing the array. If we just add the
Serializable attribute to the PriorityQueue
class, the serialization code will save all of our members (good), but
will, in serializing the array, serialize a whole set of null entries
(not so good). For a little extra effort, we can serialize only the
non-empty elements in the array.

The test passes; however, we've already discussed that the size of the
serialized priority queue will generally be larger than it should be.

Let's refactor. Mark the PriorityQueue class as
implementing ISerializable. By doing so, we are telling
the serialization engine that we'll be doing the serializing. In
adding the ISerializable interface, we must implement the
GetObjectData() method:

What's happening here? Well, first of all, we shall serialize the
capacity and the count values. We need these later when we deserialize
the object, so that we can reconstitute the queue. Now we need to
serialize the heap array, but as we've stated, it could have a lot of
empty entries. So we create a temporary array of exactly the right
size, copy all the entries into it and then serialize this temporary
array.

What about deserialization? Well, here we need to write a protected
constructor for the deserialization engine.

First we read the capacity and the count. Then we read the array of
entries, count items in all. At this point, we can create
out internal array (of size capacity, not
count), and then copy the entries from the temporary
array into the internal array. Finally we set version to
0 (we didn't serialize it, though we could have done).

Finally, after all this refactoring, we run the tests again to show
that they still pass.

(By the way, notice that we're repeating the names of the serialized
values in both the constructor and GetObjectData(). A definite code
smell that we should remove by using some constants. I won't show that
code here.)

This time around in our series on implementing a priority queue in C#,
we've discussed a subtle bug in the previous version, one that wasn't
(and couldn't really be) brought out by the unit tests, and then
solved it. Then we changed the definition of a priority from an
integer to an instance of IComparable, so that we could
do some more interesting things with priorities. Finally, we showed
how to implement efficient serialization for the priority queue class.