Hash Tables Tutorial for Complete Beginners

This is an article on Hash Tables Tutorial for Complete Beginners in C.

What are Hash Tables?

Hash table is a data structure to store key value pairs for all situations. So it is like a table with each entry has a key and a value corresponding to it. So, it is an efficient way of mapping and accessing data.

An abstract illustration of hash table:

Code:

[key 1] -> Data 1
[key 2] -> Data 2
[key 3] -> Data 3

The key can be of any type, i.e. integer, float, char, string, etc.
A common example of hash table could be - A directory storing the name of states with its std code which we are going to implement in this article.

How Hash Tables work?

When coming to how the hash tables work, we talk about three main following things:

Hash Function - Hash Function is a function which leverages accessing and searching data in the hash table. It could be any algorithm customised and optimised as per a particular scenario. It should be a logic which takes our key as the input, and generally gives out the value corresponding to it in the hash table. However, with the same input key to the logic, output should be same every time.

Key Value Pair - The Key-Value pair is the basic essence of the hash table data structure. For the suited situations, it makes searching really efficient and optimised.

Collisions - Collision is when for one key, we have more than one data value to be stored. A hash table would never guarantee that for every key, there would be just one data value corresponding to it. This is because generally a hash table would have a fixed number of keys. Any new entry after the size of hash table has to be accommodated causing the collision. However, a good hash table would always try to avoid collisions and use efficient ways to handle collisions, and at the same time optimize the size of the table. This introduces the trade off.

This surface outs another important feature of a good hash table which is uniform distribution.

Hash Tables Implementation in C

Here I have tried to implement a simple hash table in C. The scenario is: A telephone directory storing std codes for each state.

All the states would be used as key and its std code as its value. Here is the general depiction of our hash table:

Code:

[PB] -> 176
[JK] -> 172
[HP] -> 177
...

To think of, basic implementation data structure taken is an array of std codes. For each state, a two character code is defined. For each state two character code, we use a hash function to compute the index, which can be used to determine/store the std code in the array.

To start with, here is our basic data structure of a fixed size i.e. an array

Code:

int code[TOTAL];

We need a hash function for computing index out of the state. The algorithm used is as follows:

Each state is a key

Each key is stored as a two uppercase character string code.

Compute a value corresponding to each character by subtracting value '65' from its ASCII value. In this article, this value for each character will be called as its character number.

Now multiply the character number of the second character by 26 and add it to the character number of the first character. The formulae is:

Code:

val = (state[1] *26 ) + state[0]

Note, the value 26 is a magic number taken to reduce the collisions. However, the trade off is, it will increase the size of the hash table.

Now this value is the index in our std code array which will store the std code for that particular state.

However, this program is just a simple implementation of a hash table to give a general idea to a beginner. There is a huge scope of improvement and optimisation. Once, you understand it, a few limitations in context of hash table, which can be worked upon are:

A better magic number, which can reduce the fixed size of hash table considerably.

To handle collisions if any

A better algorithm to have the states as key in a more optimised way.

Conclusion

For this article, I would say we do not have any conclusion, instead it is the beginning. A beginning to learn an entirely new data structure which has a really high scope, can be complicated and optimised at the same time for many scenarios.