JavaScript STL (Standard Template Library), Part 4

JavaScript STL (Standard Template Library), Part
4

In previous articles, I've presented a programming library
based on the C++ Standard Template Library. In this article I will introduce
four new collections from the stdext namespace and show how easily many of the collections
may be interchanged within a simple JavaScript unit test.

The stdext Namespace

As explained in part 1, the C++ STL is designed to be enclosed
within a namespace called std. This namespace
contains most of the STL code and allows it to use programming names independently
of other libraries. There were some parts of the library deemed not suitable
to be included in the std namespace, but too useful to throw away, so the
stdext namespace was created as a kind of
code orphanage for all of these misfits. In this article I present the hash-type
collections, namely hash_map, hash_multimap,
hash_set and hash_multiset.

A New Code File

Since the code in stdext
is held in a separate namespace, it's only proper that the stdext
namespace should be contained within a separate file. The advantage is that
the extra code need not be downloaded if it's not necessary. Since the main
std namespace is contained within a file called
JSstl.js, a new file JSstlext.js will contain the stdext
namespace.

function STDEXT()

{

// namespace
class STDEXT

}

STDEXT.prototype.hash_map = function(hash,less)

{

return new
STDEXTHashMap(hash,less);

}

STDEXT.prototype.hash_multimap = function(hash,less)

{

return new
STDEXTHashMultiMap(hash,less);

}

STDEXT.prototype.hash_set = function(hash,less)

{

return new
STDEXTHashSet(hash,less);

}

STDEXT.prototype.hash_multiset = function(hash,less)

{

return new
STDEXTHashMultiSet(hash,less);

}

STDEXT.prototype.hash =

{

// standard hash behaviour is to
assume value is already an integer

fn:function(a)

{

return a;

}

};

// declare namespace stdext

var stdext = new STDEXT();

Like the std namespace defined
in the JSstl.js file, the stdext namespace
is defined using a class called STDEXT. The four new collections can be accessed
by calling member functions on the stdext
instance. A new standard functor is introduced to provide a default hash
implementation. Along with std.less, this functor may be used in any of the
hash-type collections where the key is already an integer.

A Bit About Hashing

Hashed collections provide an extremely efficient means of
storing and retrieving data. So long as the hash function is well designed,
inserts and retrievals should execute in a consistent time frame, regardless
of the number of elements in the collection. Hashed collections work by first
converting the keys into integers by using a hash function. These integers are
then used to index into an array of buckets where the items are stored.
The hashed collection works well when the hash function returns a different index
for every item. When two different keys hash to the same integer, then the
two different items will be stored within the same bucket. This is called
a collision. A hashed collection must have a strategy for coping with collisions;
the usual solution is to store the items in a linked list.

Imagine that we need to store a number of names in a hashed
collection. Since the key-type, string, is not an integer, it must be converted
into one using a supplied hashing function. For the sake of this example,
let's use the string length; "John" would hash to 4, "Jan" would hash to 3
and so on. Adding the names John, Jan and Jeremy will result in a collection
similar to the one in figure 1...

Figure 1. A hashed collection. Each element is accessible
in constant time.

Notice the links connecting the elements will allow the elements
to be iterated through even though they're not all adjacent within the buckets.
An important point to understand that distinguishes hashed collections from
other indexed collections is that the list of items is not stored in any predictable
order. Typically new elements will be added to the list at the tail, but if
a collision occurs, they will be inserted somewhere in the middle. If speed
of insertion and retrieval is the only concern then a hashed collection is
a good choice. If ordering of elements is important, then a map or a set would
be better.

Adding Jane to the collection exercises the collision logic
as both Jane and John hash to a value of 4. The result looks like this...

Figure 2. Jane has been added to the hash collection and
now heads bucket 4. John now takes longer to find.

With the addition of Jane to the hash collection, Jane becomes
the new head of bucket 4 as alphabetically, Jane comes before John. To find
John now requires two steps; first to Jane, and then to John.