Data Structures

Hello people…! In this post, I will talk about Graph Theory Basics, which are its terminologies, types and implementations in C. Graphs are difficult to code, but they have the most interesting real-life applications. When you want to talk about the real-life applications of graphs, you just cannot resist talking about the Facebook’s Graph Search! Now, I don’t really know that algorithm, but it uses graphs to find out your closest friends, or any other associations you have with the other users.

Other applications include finding the shortest paths, and by shortest path, I mean in the “universal sense”, it could be the shortest path of a Traveling Salesman, or the shortest path to win Snake and Ladder or even the shortest way to solve the Rubik’s Cube…! Amazing, right…?! So, let’s get started…!

Graphs are much clear when defined in mathematical terms. A Graph G (V, E), consists of two sets,

Set of Vertices, V

Set of Edges, E

As the name suggest, V, is the set of vertices or, the set of all nodes in a given graph and E, is the set of all the edges between these nodes, or the associations between them. So, |V| denotes the number of nodes in a graph and |E| denotes the number of edges. Let us take up a small example to understand these terms –

Undirected and Directed Graph

As you can see, we have two types of graphs, Directed and Undirected Graphs. In Undirected Graphs,the direction for an edge is not defined. This is generally used to indicate that the edge is actually bi-directional in nature, i.e, one can go from V1 → V2, as well as from V2 → V1. This is used to represent the graph where the states (nodes) are re-doable, such as, in a Rubik’s Cube, you can go from one configuration of the cube to the other as well as the vice-versa.

The other type, the Directed Graph restricts the… “traversal”, if you say… to only one direction. For example, in the Snakes and Ladders game, you can play dice and go from Position 5 → Position 10, but you can’t roll the dice such that it gets you from Position 10 ← Position 5… That’s obvious…! So, it really depends on the requirement of the situation, which graph you choose. Generally, we only have to deal with graphs which are purely Directed or purely Undirected. We do not talk about the hybrid type. Some of the other type of graphs are shown below –

Types of Graphs

All the graphs which we have discussed till now are Simple Graphs, they do not contain any loops. The ones which do contain loops are Non-Simple. A given graph G can be drawn in any way as long as the sets V and E remain the same. Such a graph is shown in the figure. The same graph is just drawn differently, they both have the same set of vertices and edges. Such graphs are called as Isomorphic Graphs, as the name suggests “iso” means “same”, “morphic” means “shape”, the graphs which have the same shape.

All these Graphs are Connected Graphs, i.e., for any given pair of vertices V1 and V2 ∈ V, there always exists a path between these two vertices. The last graph is the Weighted Graph. Here, the edges are given “weights”. To understand a Weighted Graph, you can think of the vertices as cities and the edges as the distance between them (so they will have some value). You could be asked the shortest path between two cities. So in the context of a Weighted graph, the shortest path may not be the one with least edges. One must keep that in mind.

Now, we will look at the way the graphs are implemented. There are mainly two ways to implement graphs, they are –

Adjacency Matrix

Adjacency List

Adjacency Matrix

It is a very simple representation where we take a two-dimensional matrix of the size |V| × |V|, i.e., the declaration in C would look like,

int AdjacencyMatrix[V + 1][V + 1];

As the arrays are zero-indexed, we generally prefer to keep the sizes V + 1.Now, initially all the values would be zeroes. We put ‘1’ in an element of the two-dimensional array, AdjacencyMatrix[i][j], if there exists an edge between Vi → Vj. Remember, arrays are zero-indexed. Let us take an example to understand this,

Adjacency Matrix

I hope it is clear from the example, how we can represent the graph using an Adjacency Matrix. It is very easy to code. All you have to do is create a two-dimensional matrix and assign the values, so, I won’t post the code, but if you have any doubts regarding the code, feel free to comment them. The Adjacency Matrix has its pros and cons which we will discuss shortly.

Adjacency List

It is an array of pointers of size |V| + 1, where each pointer points to a linked list. The linked list holds the nodes which are adjacent to the ith vertex. By adjacent, we mean those vertices that can be accessed from ith node by making a single move. It is a little complex implementation if you are uncomfortable with linked lists. But it is this implementation that we will use for Graph Search Algorithms. This is because we can easily and efficiently know which of the vertices of V are neighbours of a given vertex. And that is what you want, if your are to do a Graph Search. Now, let us take an example to understand the concept of Adjacency Lists.

Adjacency List

I hope the sketch has made it clear as to how we use the Adjacency List to implement a Graph. As I mentioned before, it is one of the most versatile implementations of the Graph Data Structure. Now, try to code the implementation in C, or any language you like. Please try this at least once by yourself so that you can get brain deep into the Graph Data Structure. It is only a linked list. And one more thing, don’t get confused between pointer-to-an-array and array-of-pointers…! So, if we put the value of |V| + 1 in a variable ‘vertices‘, now the declaration of our Adjacency List would be,

struct node * adjacency_list[vertices];

Remember this is an “array of pointers”, the declaration for “pointer to an array” would look like,

struct node (* adjcency_list)[vertices];

This is due to operator precedence. Some people make mistakes here, so, watch out! If you want to code in C++, we can have a vector where each element of the vector can hold another vector or a list, so, the declaration would be, like,

vector< list< int > > adjacencyList(vertices + 1);

Try to code for an hour or two… If you don’t get the code, it’s ok…! I’ve put my C code below. Those who got the code right… You are fabulous…! 😉 … But please take a minute to go through my code, I might have written a slightly better code than yours.

CC++ STLJavaC#

Other implementations of Adjacency List

Comparison between Adjacency Matrix and Adjacency List –

Adjacency Matrix

Adjacency List

For a Directed Graph, it consumes O(|V|2) space which is often under-utilized in the implementation. For an Undirected Graph also, it consumes O(|V|2) space which is also under-utilized as the generated matrix is symmetric about diagonal and values just repeat.

For a Directed Graph it consumes O(|V| + |E|) space which is less, and is utilized optimally. For an Undirected Graph it consumes O(|V| + 2 ✗ |E|) space, which is O(|V| + |E|), which is less.

As the memory allocation is contiguous, data access is slightly faster.

It is basically a Linked List, so memory allocation is not contiguous, hence traversal is slow as as compared to the traversal in an array. However, this can be eliminated if we use a Vector of C++ STL Library in the place of the Linked List. But C++ STL Vector is not recommended for graphs that keep growing.

Inserting an edge can take O(|E|) in the worst case, which is, there is an edge between every pair of vertices, one must traverse the whole Linked List over-and-over causing insertion of |E| nodes to take O(|E|2) time. However, if one follows Head Insertion, inserting a node will take O(1) time, and inserting |E| nodes will take O(|E|) time.

Deleting an edge is very simple, we just set the corresponding element value to 0, which takes O(1) time.

Deleting an edge is time-taking as, one would have to traverse the Linked List, which is slow (non-contiguous). In the worst case, it takes O(|E|) time.

With the knowledge of Adjacency Matrix and Adjacency List alone you won’t be able to do the competitive programming questions, because you do not know how to explore your graph to get the desired result. For this, you need to know the Breadth First Search (BFS) Algorithm and the Depth First Search (DFS) Algorithm. This post is only to give an introduction. Feel free to comment your doubts..! Keep practising…. Happy Coding…! 🙂

Every programmer needs a suitable data structure to compute some complex function or to do a time-taking task quickly. In this post, I will explain the Binary Indexed Tree (BIT) data structure also known as the Fenwick Tree in post-soviet countries. This data structure is very easy to code in competitions. So, a programmer who knows BIT will have a massive edge over others. But the problem with BIT is that it is NOT easy to understand! A beginner may go nuts in trying to understand the working of a Binary Indexed Tree. I will explain this data structure step-by-step using as many images, tables and examples as I can.

Binary Indexed Tree becomes handy in manipulating Cumulative Frequency Tables. Now, I assume you know what Frequency and Cumulative Frequency is! Now, consider we have 12 objects with us of the respective frequencies –

Item No.

Frequency

Cumulative Frequency

1

4

4

2

2

6

3

7

13

4

5

18

5

1

19

6

3

22

7

6

28

8

4

32

9

6

38

10

6

44

11

3

47

12

3

50

Now, if we were to increase the quantity of Item 5 by 1. We would have to change the whole table. The change would be like,

Item No.

Frequency

Cumulative Frequency

1

4

4

2

2

6

3

7

13

4

5

18

5

1 + 1

19 + 1

6

3

22 + 1

7

6

28 + 1

8

4

32 + 1

9

6

38 + 1

10

6

44 + 1

11

3

47 + 1

12

3

50 + 1

If we were to code this manipulation using an array of integers, the implementation would go like,

Array Element

Element Value

arr[0]

4

arr[1]

6

arr[2]

13

arr[3]

18

arr[4]

19 + 1

arr[5]

22 + 1

arr[6]

28 + 1

arr[7]

32 + 1

arr[8]

38 + 1

arr[9]

44 + 1

arr[10]

47 + 1

arr[11]

50 + 1

Clearly, implementing the simple increment would cost O(n). This is because in Cumulative Frequency, every element holds the “prefix-sum” of the elements, i.e., the sum of elements till that index. Now what if we hold only partial sums? Say, there are three variables that hold the partial sums and the final sums can be evaluated by summing up the partial sum. Now, the question arises is, “How does this help the increment manipulation?”. This is explained by the following diagram.

Partial Sums

So, we can see how by using the Partial Sum Technique, we managed to decrease the number of operations to perform. This is the raw underlying idea of the Binary Indexed Tree. But the Binary Indexed Tree uses this partial sum technique far more efficiently and quite amazingly actually! The BIT is constructed on the principle that “all numbers can be represented as the sums of powers of 2”. This is more or less similar to the principle behind the binary numbers. Now the working of the BIT is that, at indices
of the values of powers of 2, they contain the total sum of the frequencies till that index. Which means if we have an element of an array, arr[7], it will have the sum of elements from arr[0] to arr[7] (including the value of arr[7], i.e, frequency of the 8th element). Thats ok! But what will the other elements contain…?! The Partial Sums…! I make this clear with the help of my next sketch where I show a simple Binary Search Tree (BST) where the nodes have the values of the frequencies and the nodes are constructed in such a way that the value of the “index” is taken into consideration while constructing the BST but the nodes hold the frequencies as the values. Then I will give a super simple step-by-step explanation.

The Initial Binary Search Tree

We all know that the Binary Search Tree is such that the left-subtree holds all the values less than the current node. Now, the Binary Indexed tree is such that, the current node holds the sum of all values in it’s left-subtree + the value of the current node. For example, in the first tree in the above image,

So, now we see that the Node 4 will be the sum of Nodes 1 to 4, similar is the case with all the nodes that are powers of 2. These nodes are somewhat special, as they contain the “prefix-sum”. Hence, I marked them in my previous image. Now we will apply the concept to our second tree in the above image.

The Binary Indexed Tree !

Wait…! What…?! The “array” is the “tree”…?! Yes…! The array we obtained is called the Binary Indexed Tree. For whatever applications a BIT is used for, it is used as this array. Pretty cool, uh…! So, this means you don’t have to get messed up creating structures, pointers and doing tree travels! All you’ll need is this simple array. Now, before we get into the coding part of the BIT lets observe this BIT “array” a little more closely so that we can understand it more clearly. Again, I use an image because we all know, “A picture speaks more than a thousand words!”.

Partial Sums in a Binary Indexed Tree

The picture clearly shows how the Fenwick Tree (or BIT) stores the partial sums and that at indices of powers of two it will have the complete sum of all the elements behind it. What really allows the Fenwick Tree to have this feature is the very structure of the Binary Search Tree (BST). In a BST, the nodes with index in powers of two lie at the left-most part of the tree (recall our very first tree of the first image), now, we know for a Binary Indexed Tree, the value of a node is the sum of the components of the left-subtree + the value that is supposed to be in that node. But what will the left-subtree of a Binary Search Tree have?… “Some” elements surely less than it. Now, what will the left-subtree of the left-most wing of a tree have? “All” the elements surely less than it. If you ponder on this statement for a while you will understand why the indices of the powers of 2 have the whole “prefix-sum”. Now I will depict you the effect of increment of an Item in the Binary Indexed Tree, for this I use the tree representation once again.

Incrementing an item in BIT

As we can clearly see, in a worst case scenario, such as the above, we’d have to update all the nodes beginning from the lowest level and go up till the root node. So, in a worst case scenario, we’d have to update the “height of the tree” number of nodes, which is log n as we know it. So, the increment operation here costs only O(log n), which is very fast! Recall that our brute force or the naive approach costed us O(n), so now we know how using BIT speeds up the manipulation. But how do we do this in an array? For a given frequency table, how do we construct the BIT? We will see this coding part now, which is also difficult to understand but easy to implement.

For a given frequency array freq[] of size ‘n’, create another array bit[] of size ‘n’ and initialize bit[] to zero. Now, run a loop where we take every element of freq[] one-by-one and put it into bit[]. What actually are we doing is that we are doing the increment operation over and over again. Look over the code closely –

You might feel uncomfortable with the “j = j + (j & (-j))” thing, for now, forget that, we’ll come back to it in a minute. Look what is logically happening in the code. Being able to read raw code is a gift, if you can, you’re awesome! Be proud of it! For those who can’t, here is what happening. Listen carefully!… First we take the collection of items to be empty, so, all items have zero frequency. Then, we take the first item, it’s frequency is 4, so we “increment” the frequency of that item from 0 → 4. Now, recall the picture of the effect of increment in Fenwick Trees. I drew a Binary Search Tree and incremented the leaf node and up till the root by 1. Here, we will increment by 4, all the corresponding log n nodes. For better understanding watch the picture below closely –

Working of a Fenwick Tree

It is clear from the explanation how the Fenwick Tree gets going. But how does one traverse so strangely, I mean, the first Item we had to go like, 1 → 2 → 4 → 8, then for the next Item we had to go like, 2 → 4 → 8, and then like 3 → 4 → 8…! How does one put it in a code? This is exactly what “j = j + (j & (-j))” does! It takes you to the next location where you are to increment the value. It is a standard expression for traversing a Fenwick Tree. If you are really eager to understand the logic behind the standard expression, you have to read this wonderful blog by Devendra Agarwal. It was where I was able to start my BIT journey. Lastly, I will just show you an output of the same above code but with little modification so that it prints the values of variables and BIT array step-by-step.

If you have understood till here, you will be able to understand the output without breaking a sweat. I will conclude this post here. It has become too large already, and I guess my reader’s would also like a break. Thanks a lot for being so patient with me! If you really are eager to do some questions on this topic, go for Insertion Sort Advanced Analysis of HackerRank. I enjoyed solving that question, and I hope you will too! Feel free to comment your doubts..! Keep practising…! Happy Coding…! 🙂