Category: Friday Fun Session

Dijkstra’s algorithm cannot work with negative edge. Also, we cannot trivially add a constant to each of the edge weights and make them non-negative to proceed further.

Why Does Dijkstra’s Algorithm not Work with Negative Edge?

In the above figure, we are trying to get shortest paths from source node 1 to all other nodes (node 2 and node 3). Since Dijkstra’s algorithm works by employing a greedy process, it outputs 20 as the shortest path cost to node 2.

As we can see, from node 1, we can go to two nodes – node 2 and node 3, at a cost of 20 and 40 respectively. Hence, going to node 2 is cheaper. And that is why, it outputs 20 to be the cheapest cost to reach node 2.

However, we know that the cheapest cost to reach node 2 is through node 3. And the associated cost is: 40 + (-30) = 10. So Dijkstra’s algorithm gets it wrong. It gets it wrong because it cannot foresee that later, a negative edge can bring down the total cost to below 20.

If we carefully observe, we see that the wrong calculation by Dijkstra’s algorithm happens due to the negative edge. Had cost from node 3 to node 2 not been a negative value, it could never bring down the total cost to lower than 20, after getting added to 40.

Why Does Adding a Constant Cost to Each Edge not Work?

Now that we realize, Dijkstra’s algorithm fails due to the negative edge from node 3 to node 2, having the value -30, we might be tempted to add 30 to each of the edges. We might think, this way we can remove the negative edge. And doing so would be fair; after all, we are adding the same value to each of the edges. Let’s do it and see what happens.

After updating the edge costs, the graph looks as shown above. So what is the cheapest path from node 1 to node 3 now?

Well, now the cheapest cost is 50, which uses the direct edge from node 1 to node 2. But this is not supposed to be the cheapest path, right? The cheapest path was node 1 -> node 3 -> node 2, before we adjusted the edge cost. Adjusting edge cost should not change the graph. It must not change the cheapest path, right?

So why does that happen? Well, if we observe, we find that path node 1 -> node 3 -> node 2 uses two edges/segments – node 1 to node 3 and node 3 to node 2. On the other hand, path node 1 -> node 2 uses just one edge/segment. The way we have updated the edge cost – adding a constant to each path segment – is not fair to a path using more path segments. For the path that uses two path segments, which was originally the cheapest path, we have added the constant 30 twice. On the other hand, for the path that uses just one path segment, we have added 30 only once. That way, we are unfair to the path using more path segments.

We must add a constant to each of the paths, not to each of the path segments.

Solution

Johnson’s algorithm does this – add a constant cost to each path. It does so, by finding a special set of offset values to remove the negative edges from a graph. Once that is done Dijkstra’s algorithm can work. But that works in absence of a negative cycle in the graph.

Negative Cycle can be identified by looking at the diagonals of the dist[][] matrix generated by Floyd-Warshall algorithm. After all, diagonal dist[2][2] value is smaller than 0 means, a path starting from 2 and ending at 2 results in a negative cycle – an arbitrage exists.

However, we are asked to incrementally compute the same, at cost of O(n2) for each new vertex.

Floyd-Warshall algorithm takes O(n3) time to compute All-Pairs Shortest Path (APSP), where n is the number of vertices. However, given that it already computed APSP for n nodes, when (n+1)th node arrives, it can reuse the existing result and extend APSP to accommodate the new node incrementally at a cost of O(n2).

Converting Rates

If USD to SGD rate is r1 and SGD to GBP rate is r2, to get the rate from USD to GBP, we multiply the two rates and get the new rate that is r1*r2. Our target is to maximize rate, that is maximizing r1*r2.

While giving output, say the best rate from the solution, the rate as used in the dist[][] matrix should be multiplied by -1 first and then raised to the bth power, where b is the base (say one of 2, 10 etc.) of the log as used earlier.

Visualizing Floyd-Warshall

We have seen the DP algorithm that Floyd-Warshall deploys to compute APSP. Let us visualize to some extent as to how it is done for 4 vertices.

What Cells Are Used to Optimize

The computation will be done using k = 1 to 4, in the following order – starting with cell 1-1, 1-2, . . . . .2-1, 2-2, …….3-1, ……. 4-3, 4-4.

At first, using k = 1.

Let us see how the paths are improving using the following two examples.

dist[2][3] = min (dist[2][3], dist[2][1] + dist[1][3])

and dist[3][4] = min (dist[3][4], dist[3][1] + dist[1][4])

We see that for k = 1, all paths are optimized using paths from 1st (kth) row and 1st (kth) column.

Kth Row and Column do not Change

What about paths on kth row and kth column?

dist[1][2] = min(dist[1][2], dist[1][1] + dist[1][2]) – well, there is no point in updating dist[1][2] by adding something more to it.

So we see, at a certain kth iteration, kth row and kth column used to update the rest of the paths while they themselves are not changed.

At k = 1

At k = 2

At k = 3

At k = 4

Consider Only 3X3 Matrix Was Computed

Now assume that we did not consider that we had 4 vertices. Rather we considered that we had 3 vertices and completed APSP computations for all paths in the 3X3 matrix. We ignored the 4th row and column altogether.

So we have APSP computed for the following matrix using k = 1, 2 and 3.

Add 4th Vertex

Let’s say, 4th vertex arrives now. First, we can compare the computations used for the above 3X3 matrix with the same for the 4X4 matrix as shown earlier and find out what all computations need to be done now to extend this 3X3 matrix to 4X4 matrix to accommodate the new 4th vertex.

We will find that at first we have to optimize the 4th row and column using k = 1, 2 and 3. Let us do that.

Note that at this point, 4th row and column are not used to optimize paths for the older 3X3 matrix. So now that we have the 4th row and column optimized using k = 1, 2 and 3, we have to optimize that 3X3 matrix using k = 4.

This way, we don’t miss out any computation had we considered all the 4 vertices at one go. And thus we are done with optimizing all the paths in the 4X4 matrix.

Complexity

The complexity for this incremental building for a new vertex is clearly O(n2). That makes sense. After all, for n vertices the cost is O(n3) that is the cost of Floyd-Warshall, had all n vertices were considered at one go.

But this incremental building makes a huge difference. For example, consider that we have 1000 vertices for which we have already computed APSP using 1 billion computations. Now that 1001st vertex arrives, we can accommodate the new vertex with a cost of 1 million (approx.) computations instead of doing 1 billion+ computations again from the scratch – something that can be infeasible for many applications.

Printing Arbitrage Path

We can find the first negative cycle by looking (for a negative value) at the diagonals of the dist[][] matrix, if exists and then print the associated path. For path reconstruction, we can follow the steps as described here.

Dijkstra’s algorithm is a Single-Source Shortest Path (SSSP) algorithm developed by Edsger Wybe Dijkstra. It uses a greedy process and yet finds the optimal solution. It looks similar to Breadth-first search.

Compare to Bellman-Ford

It is asymptotically the fastest SSSP algorithm, at a cost O(|E|+|V|log|V|), when min-priority queue implemented by Fibonacci heap is used.

That is quite cheap, given Bellman-Ford’s complexity of O(|V||E|) to find the same, something that can become prohibitively expensive for a dense graph having |V|2 edges.

However, while Bellman-Ford can work with negative edge and can detect negative cycle, Dijkstra’s algorithm cannot work with negative edge. Since it cannot work with negative edge, there is no question of detecting negative cycle at all.

Given source vertex s, it finds the shortest distance from s to all other vertices. At first, it initializes dist[] vector to infinite; to mean that it cannot reach any other vertex. And sets dist[s] = 0; to mean it can reach itself at a cost of 0, the cheapest. All vertices including itself are added to the vertex set Q.

Then, it chooses the vertex with min dist[] value. At first, s (set to u) would be chosen. Then using each of the outgoing edges of u to v, it tries to minimize dist[v] by checking whether v can be reached via u using edge (u, v). If yes, dist[v] is updated. Then again it retrieves vertex u with the cheapest dist[u] value and repeats the same. This continues till Q is not empty. Whenever, a vertex u is removed from Q, it means that the shortest distance from s to u is found.

Since we are retrieving |V| vertices from Q, and for each vertex, trying with all its edges (=|V|, at max), to minimize distance to other vertices, the cost can be |V|2.

So, here we see a greedy process where it is retrieving the vertex with min dist[] value. Since retrieving a vertex u from Q means that we found the minimum distance from s to u, if we are solving shortest path from a single source s to a single destination d, then when u matches the destination d, we are done and can exit. It can also be noted that from source s, we find the shortest distances to all other vertices, in the ascending order of their distances.

Finally, we see that dist[] vector is continuously changing. And each time when we retrieve a vertex u, we choose the one with min dist[] value. That indicates using min-priority queue might be the right choice of data structure for this algorithm.

In the above algorithm, we have used a function called decrease_priority(), something that is absent in standard priority queue but present in Fibonacci heap. So the above algorithm is implemented using Fibonacci heap.

Fibonacci heap is a special implementation of a priority queue that supports decrease key(decrease_priority()) operation. Meaning, we can decrease the value of a key while it is still inside the priority queue. And this can be achieved by using constant amortized time for insert, find min and decrease key operation and log (n) time for delete min operation.

As for cost, since we have called delete operation for each of the v vertices, and we have treated each of the |E| edges once, the cost here is O(|E|+|V|log|V|), as mentioned at the beginning of this post, as the cost of Dijkstra’s algorithm.

Using Standard Priority Queue

Standard priority queue implementation takes log (n) time for both insert and delete operation and constant time for find min operation. But there is no way to change the key value (decrease key) while the item is still in the priority queue, something Dijkstra’s algorithm might need to do quite frequently as we have already seen.

If standard priority queue is used, one has to delete the item from the priority queue and then insert into it again, costing log (n) each time, or an alternative to that effect. However, as long as standard priority queue is used, it is going to be slower than Fibonacci heap. With standard priority queue, the algorithm would look like below:

First, we have not inserted all vertices into the standard priority queue at first, rather inserted the source only.

Second, instead of decreasing priority that we cannot do using standard priority queue, we have kept on inserting vertex v when dist[v] decreases. That might mean, inserting a vertex v again while it is already there inside the queue with a higher priority/dist[v].

That is another way of pushing aside the old entry (same v but with higher priority) out of consideration for the algorithm. When shortest distances from source s to all other vertices v are found, those pushed aside vertices will be pulled one by one from the priority queue and removed. They will not affect dist[] vector anymore. And thus the queue will be emptied and the algorithm will exit.

Floyd-Warshall, also known as Roy-Warshall is an All-Pairs Shortest Path (APSP) algorithm developed by Robert Floyd, Bernard Roy, and Stephen Warshall. It is an example of dynamic programming that uses 3 nested loops. At a cost O(|V|3), it is quite impressive, given that Bellman-Ford might encounter the same cost (O(|V||E|)) to find only Single Source Shortest Path (SSSP) for dense graph having |V|2 edges. Floyd-Warshall can work with negative edges just like Bellman-Ford. After all, both are based on dynamic programming. As for detecting negative cycle, once again, both can detect it. However, in presence of negative cycle, results from both are invalid.

To compute the shortest path between any pair (s, t), we have considered each of the |V| vertices as intermediate points k, and chosen the cheaper between i) existing (s, t) and ii) the sum of s to k and then from k to t, meaning s to t via k.

Short-circuiting an SSSP?

Does it mean that we can derive a SSSP solution for any pair (s, t), at a cost of O(|V|2)? To be precise, can we do the following?

After all, we have relaxed via all the intermediate nodes. Well, that will not work! Why?

Dynamic Programming

If we want to get the shortest path between (i, j) using k (1 to k) intermediate nodes then we have to choose the cheaper between the below paths:

Without using k: dist[i][j] using intermediate nodes 1 to k-1.

Using k: dist[i][k] + dist[k][j], where both dist[i][k] and dist[j][k] should make use of intermediate nodes 1 to k-1.

At k = 0, dist[][] is initialized using edge weights where exists, 0 for diagonals (dist[v][v]) and infinite for the rests.

An Example

Suppose, we want to compute dist[2][3] when k = 5.

Then, dist[2][3] = min { dist[2][3], dist[2][5] + dist[5][3] }

Here, all three distances – dist[2][3], dist[2][5] and dist[5][3] must already use intermediate nodes 1 to 4. Meaning, dist[2][5] is not the static cost set at k=0; possibly the edge cost, 0 or infinite. Rather, dist[2][5] is already computed using k from 1 to 4. Similarly, dist[5][3] (and dist[2][3] as well) is also computed using k from 1 to 4.

In other words, we cannot compute a certain dist[s][t] alone, using the intermediate nodes 1 to k. Rather for each intermediate node k, we need to compute dist[i][j] progressively, using the 3 nested loops, as shown earlier.

Obviously we can use recursion without the loops. That will not save any work for us. In fact, while using recursion, if we are not reusing existing solutions for the sub-problems, we will repeat the computation – something very expensive.

Path Reconstruction

The predecessor matrix p, keeps track of the shortest path. If we have to find the best path from s to t, we know for sure that we start with s. We print s. To know where we went from there, we have to look at p[s][t]. If that is t, we are done as that is the destination. However, if that is not the case, that means we find another node r. Then we know from s we went to an intermediate node r. So this becomes the new start s for the rest of the path. However, destination remains the same t. Again we look at p[s][t] and continue the same till we reach t, all along printing r (=p[s][t]).

Incremental Node Addition

Suppose as of now, we have 4 nodes and APSP is computed. At this point 5th node arrives, along with some edges connecting the existing nodes. Instead of computing APSP from the scratch, at a cost of O(|V|3) = O(125), we can use the already computed APSP and extend that to complete it for 5 nodes, at a cost of O(|V|2) = O(25).

Adjusting Edge Weight Changes

What if weight for an edge changes (increases or decreases)? Do we need to re-compute APSP from scratch? Or we can adjust the existing results using some partial computations?

Using priority queue

Suppose we have n scores and the window length is m. We can simply move the window from left to to right and take (consecutive) m scores within the window and each time compute the max of them, and output it, if it is already not outputted. Finding max from m scores would take O(m) and as we do it n times (n-m+1 times, to be precise), the complexity would be O(mn). However, it was expected that the complexity would be better than this.

Alternatively, we can use a max-heap, where we push each score as we encounter it. At the same time, we retrieve the top of the max-heap and if all is fine – output it. By if all is fine, we mean to say, we need to make sure that the orange has not been already outputted and that it belongs to the current window. At the same time, if the top item is out-dated, we can pop it, meaning take it out of the heap. Note that, max-heap is a data structure that retains the max element at the top.

Let us walk through an example

Let us take the first example as mentioned here. For the scores 1 3 5 7 3 5 9 1 2 5 with window size 5, let us walk through the process.

At first, we push the first 4 items (4 is one less than the window size 5). The max-heap would look like: 1 3 5 7 where 7 is the top element.

Then for each of the remaining items, we do the following:

If the (new) item is greater than or equal to the top item in the max-heap, pop it (out) and push the new item into it. Output the new item (if the same orange is not already outputted). We do it because the new item is the max in the present window. And the existing top one is of no further use. We can now move to the next item.

Keep on popping the top as long it is not one belonging to the current window. We do it, as we are interested to find the max within the window, not the out-dated ones those are no longer inside the window.

Output the top (if the same orange is not already outputted). We do it as it the max within the current window.

If the top item is the oldest (left-most/first/earliest/starting one) in the current window, pop it. We do it because this item is going to go out of the window as the next item gets in.

Score 3:

3 is not >= 7 (top in heap)

Existing top (7) is within the current window. Output it.

Push 3; max-heap looks like: 1 3 3 5 7

Score 5:

5 is not >= 7 (top in heap)

Existing top (7) is within the current window (3 5 7 3 5). But this orange is already outputted (we can use index of the item to track it, meaning instead of just pushing the score, retain the index along with it). No output this time.

Push 5; max-heap looks like: 1 3 3 5 5 7

Score 9:

9 >= 7 (top in heap)

Pop 7, push 9, output 9.

New max-heap: 1 3 3 5 5 9

Score 1:

1 is not >= 9 (top in heap)

Existing top (9) is within the current window. But this orange is already outputted. No output this time.

Push 1; max-heap looks like: 1 1 3 3 5 5 9

Score 2:

2 is not >= 9 (top in heap)

Existing top (9) is within the current window. But this orange is already outputted. No output this time.

Push 2; max-heap looks like: 1 1 2 3 3 5 5 9

Score 5:

Existing top (9) is within the current window. But this orange is already outputted. No output this time.

Push 5; max-heap looks like: 1 1 2 3 3 5 5 5 9

No item left. We are done!

Finally, output is 7, 9.

Complexity

If we closely observe, we see that the size of the max-heap would be always around m. Because, if the new item is greater or equal we are popping the top – hence the max-heap size is not increasing. If new item is smaller, we are pushing it and the size of the max-heap is increasing – true; but then soon the top would be out-dated and then we would pop that. So the max-heap size remains around m. Pushing (in) or popping (out) an item would cost log m, and since we would do it n times – the complexity would be O(n log m). Please note that getting the top of the max-heap costs O(1).