2. Minimizing The Number Of Collisions

There are two strategies for minimizing the number of collisions. The
first is simply to choose a hashing function that spreads the possible
key values evenly across all the different positions in the
hash table. A commonly used form of hashing function, when keys are
integers (or easily converted to integers), is:

H(Key) = (P * Key) mod TABLE_SIZE

where P and TABLE_SIZE are different prime numbers. Unless the key
values have some unusual properties - for instance, they are all
multiples of TABLE_SIZE - a hashing function of this form will
distribute them uniformly across all the different array positions.

In the above example, the table should have been of size 101 (a
prime) not 100. If the table size is 100, and all the keys are even
numbers, none of them will be mapped to an odd hash value.

The second strategy is simply to make the table larger, either by
allowing several values to be stored in an array position (the
`bucket' method), or by having more positions available. Doubling the
size of the table will halve the expected number of collisions.

The latter strategy gives rise to an important property of hash
tables that we have not seen in any other data structure. With a hash
table, the efficiency of the operations is under our control
to a significant extent. We directly control the size of the table,
and thereby control the efficiency of our operations. We can easily
make the expected number of collisions as small as we like: all we
have to do is increase the table's size.

The load factor is defined to be the ratio:

When the load factor is small - i.e. when the table is relatively
empty - the chance of a collision is small and the operations are
`almost' constant time. When the load factor is high, the operations
degrade to log-time, at least, and can even degrade to linear time (if
collisions are are resolved in an unsophisticated way). The load
factor is directly under our control (because we can choose the size
of the table) and this gives us control over the efficiency of our
algorithms.