Introduction

The main objective of this article is to propose a way to use Dijkstra's algorithm without the memory limitations imposed by a huge static allocated bi-dimensional array, allowing it to handle a large set of nodes.

As "large set" I mean something from 100 to 1000 nodes. With lesser than 100 nodes, you can use any solution that uses arrays found all around the web.

The code we propose is not limited to 1000 nodes (actually I've not yet found a upper limit), but after 1000 nodes the time to find a solution is not acceptable.

The second part of this article will propose a solution that will be able to handle millions of node with a very low processing.

Background

Djikstra is a very generic greedy algorithm and it's solution includes the worst case scenario: A graph where every node can be connected to all other nodes.

But after studding many problems, I found that most of them are based on "2D map like" sets of nodes. In this scenario the connectivity between the nodes are very low. Nodes are normally connected only to nodes immediately around it creating a very sparse link matrix.

The basic Djikstra's algorithm that exists in the web does not take that in consideration and allocates resources to map all possible connections between nodes, even to itself.

Let's analyze this sample grid:

It represents a 15x12 "2D map like" grid. This grid is the one used in the sample source code available to download above.

Now... if we consider it as a full connected matrix we would need 180 (15x12) nodes to represent it. That would lead to a link matrix with 32400 (1802) decimal (or integer) values to feed Djikstra. All the codes I've download from the web crashed ("memory overflow") when I tried to produce a link matrix with more than 100 nodes.

If the program does not crash the allocated memory will make the code (and all the machine) slow.

The matrix size is always the square of it's number of nodes. That means that the problem grows exponentially. So even if you get a very huge and powerful machine soon you will get to a point where you will not be able to even process it.

So... let's solve this problem first...

Preparing the Data

First thing to do is to notice that not every point in the grid is used. There are voids. So instead of starting from the grid dimensions we will build a list of nodes. This reduces the problem a little bit: from 180 to (in this sample, of course) 120 nodes.

That didn't help much, but it's a start...

We also notice that the nodes are not completely connected and not every connection is bidirectional. We can reduce the problem to the links we really use. That means reducing the entire problem to only 246 links (again, in this sample).

Now we are talking…

In my studies I found that the total number of links normally stays between twice to three times the number of nodes. Surprise! The problem became linear!

At this point we became aware that the main problem is not Djikstra’s algorithm itself but actually preparing the input data to use with it. But we still can improve the code to avoid too much memory consumption. We will talk about it later.

With that in mind, let’s see an extract from the input file I used on the source code sample:

The first part is the list of nodes. It describes the point coordinate in the grid. It's only necessary if you want to plot the grid. Otherwise you will need only the second part. This one is a "table like" list that shows the first node, the second node and any amount of data that connects the two. In this sample, I’m giving a distance in miles and a time in minutes. So, the line...

0 1 0.0625 4.5

... means that the node 0 is connected to the node 1 and their distance is 0.0625 miles and it takes 4.5 minutes to go from 0 to 1.

The next line...

1 0 0.0625 4.5

... says that there is also a connection from 1 to 0 making it bidirectional. The values are the same but it's not necessarily true. You could have different values if you are coming or going.

The last part of the input just gives a list of cases we want to use to get some results over the code. So... The section that really matters in the input is the second one. That data can come from anywhere, a database, a GPS batch file, GoogleMaps.

But it holds the main feature of this solution. By passing the values as a delegated function to the class, no static matrix is required at all.

The values are fetched "on demand". And you have total control over the input and therefore over what is served to Djikstra.

You just need to keep in mind that Djikstra is a cumulative algorithm so all valid values must be non negative. A value of -1 means that those nodes are not connected, but in our way of representing the nodes the better thing to do would be just take them out of the list.

That control gives some important advantages:

1 - All validation can be made inside the function. No exceptions need to be generated during the solution interactions, just return -1.

2 - The values really do not need to be static. You can build functions that alter the output of the function depending of some extra information.

For instance, if you want that the GetTime output changes along the the day (due traffic for exemple), maybe the GetTime function could have this signature: Func<int, int, DateTime, decimal>.

Imagine that you would normally take 2 hours to go from one place to another, but in the middle of the way (after 1 hour) the rush hour starts. The values for time you started with are not valid anymore and have to be adjusted. That 2 hour trip (in normal traffic) became a 2,5 hour trip (with part of it during rush hour). That's a more precise output.

The possibilities are endless.

The most important part of this code is not exposed. Is the Interaction function.

This is where Djikstra magic occurs and its logic was almost not changed, just improved with LINq and without any explicit static arrays.

The only list that is always allocated during the life span of the solution is the internal _nodes list. That list has the maximum dimension of NumberOfNodes. So it's really tiny.

Parallelism may be an alternative here but I will talk about that in the next part of this article.

One final thing I'd like to mention here is the return value of the Solve funtion.

public IEnumerable<int> Solve(int startNode, int endNode)
{
if (startNode <0 || startNode >= NumberOfNodes) thrownew ArgumentException("The start node must be between zero and the number of nodes");
if (endNode <0 || endNode >= NumberOfNodes) thrownew ArgumentException("The end node must be between zero and the number of nodes");
_startNode = startNode;
Reset();
for (var i = 1; i < NumberOfNodes; i++) Iteraction();
GetNode(endNode).AddStep(endNode, 0);
return GetNode(endNode).Path.Select(p => p.Node);
}

I've tried many outputs but I found out that the best one (for me) was the list of nodes holding the calculated path.

Since I have the function(s) that gives me the values between two nodes, if I receive the chain of nodes that connects the start to the end, I can zip then and fetch the values for each step individually. An even apply a different function to compare results.

The Output

In the solution available to download it's shown how to use two different functions: One to fetch the shortest and the other to fetch the fastest path. You will see they (almost always) are not the same.

Before running the code in DEBUG mode, change your console output to 160 characters per line for a better view. It will show a representation of the link matrix. That is just to demonstrate how sparse it is. Each character is one connection between two nodes. It represents a 120x120 character matrix with a blank where there is no connection and a dot representing an existing connection. Notice how the connections go side by side the main diagonal. It means that all nodes almost always connect to other nearby node. Also the main diagonal itself has no dots meaning that there is no connection between the node and itself.

That clearly shows how much memory is wasted if we allocate a full static bi-dimensional array.

If run it in the RELEASE mode the output will be send only to a file. That will show how fast this code can be.

What is Next?

This article deals with a imaginary grid with only 120 nodes that gives us around 240 links. In the real world even a small neighborhood in a small city can have around 10000 nodes. Any simple game map or maze will have much more than that.

And what happens when we need to go to a even greater area. For instance how to connect a place in New York with one in Los Angeles. Imagine the number of nodes. After a very raw and superficial calculation I’ve got that this number is in the order of 1010 (!!) nodes.

Not even the best machine could do that in a reasonable time with the code above. We know that Google does that. So... Where is the trick?

In the next article I will talk about one possible approach to solve that kind of problem.

Share

About the Author

André Vianna is a Software Engineer in Rio de Janeiro, Brasil.
He is PMP, MCP and have worked as Development Project Manager, Senior Software Developer and Consultant for more than 15 years.
He worked with several major companies in Brasil, EUA, Canada and Mexico.
He has been working with the cut edge .NET platform since 2001 and is a C# specialist.

After considering the comments you've all posted (I've really appreciated... Please continue) I've decided to change the article level to Intermediate.
When I was righting it Intermediate was my first choice but then after viewing another equivalent articles classified as advanced I've marked it that way to be consistent.
But you are right... This article is really not advanced and there is no excuse in propagating an error.

It was an informative article with good experimental data. It's not an advanced article in my idea too. Also just to give you some good news, your guess of 10^10 nodes is not true. In all the USA and Canada, there are about 26 million nodes and about 50 million edges as a road network graph. In your example from LA to NY, using A* algorithm, one will end up expanding/discovering about 6 million nodes.

Maybe I'm being harsh, but you have placed this article in the "advanced" category, when it contains nothing that is not already widely known -- and also an error.

While you are right to notice that it's unnecessary to record all O(n^2) edge weights in an "adjacency matrix", all you have rediscovered here is the "adjacency list" representation for storing a graph -- an equally common graph representation. It's surprising to hear that versions of Dijkstra you found on the web all use an O(n^2) "adjacency matrix" representation, since as you noticed this is terribly wasteful for sparse graphs where most vertices are not linked by an edge, and Dijkstra is often run on these types of graphs. Just googling "Dijkstra's algorithm" takes you to a Wikipedia article whose "Running Time" section explains how the "adjacency list" representation is faster for sparse graphs. That article also suggests using binary heaps for more quickly finding minimum-weight edges, a strategy which (a) has been known for decades and (b) is not (as far as I can tell) used in your LINQ code. The algorithm described there using regular binary heaps (often called "priority queues" in programming language libraries) runs in O(m*log(n)) for n nodes and m edges, which will quickly solve instances with millions of edges on a modern desktop PC.

Finally, the error: You claim that the delegate could incorporate further information (such as the current time) into the calculation that returns an edge's weight. But this is not true in general, since if the delegate is allowed to return different edge weights for the same edge (e.g. due to different values of a current time parameter) then Dijkstra's algorithm is not guaranteed to find a shortest path. Here is a counterexample: an edge from start to finish that initially takes time 100, but after 1 time unit drops to taking time 0. This will be the shortest path if all other paths take 2 time units or more, but Dijkstra will miss it because it will be assigned a cost of 100 when it is first seen and it will never be revisited.

About the "advanced" category... I first classified this article as Intermediate but then I made a search for those articles here and saw that they were classified as "advanced", so I changed the classification to be follow them. I agree that there is nothing really advanced in it.

I also agree that these implementations are not perfect yet. But this is only the first part. In the second part of this article I will talk about some of the topic you mentioned. And I'm not claiming to bring anything new just a way to code that in C#.

About the error you mentioned, I respectfully disagree. Maybe I was not very clear. I mean using a time of the day as a starting point of the calculation. Each step adds the cumulated time to that value a for consequence the actual time that you will pass at that node.
For instance (just a very simple sample), if you start at 4pm and takes 6 minutes to walk to each step of the path after 10 steps it will be 5pm and maybe in your model 5pm makes the time of each step changes to 8 minutes. So the weight from that step forward must be changed. Other example, maybe during the path every time a event happens (independently of what node you are) the weight changes.
At the bottom line what I'm saying is that you can add as many parameters you want to the value function not only the start and end node.

Firstly thanks for your reasonable and measured response. Although I don't have any articles on CodeProject, I'm familiar with being downvoted on StackOverflow and I know it stings...

I can follow your reasoning for classifying this article as "advanced." While I don't agree 100%, I can see what you mean -- especially with another article to follow that will address some of the shortcomings of this one. I'll change my vote.

Regarding the error however, I believe that still stands. Dijkstra's algorithm is only guaranteed to find an optimal path if the weights of edges cannot change. It may chance on finding a good path, or even the optimal path, for some graphs whose edge weights do change over time, but it is not guaranteed to do so. That's because it only considers each edge once; so if an edge's weight changes after it has been considered, that edge (and all paths containing it) must be reconsidered, but they are not.

Now if you restrict either the type of graph, or the types of transformations of edge weights that can be performed, it may be that the resulting class of graphs can be provably solved optimally by Dijkstra. For example, if the graph is actually a tree, then there is only one way to reach the destination vertex and thus there is never any need to reconsider different ways of getting to it. In that case, every edge's weight can be determined in any way -- it won't affect which edges are in the final path chosen.

I sketched a counterexample for you before but it was a bit abstract. Hopefully the following will make things clear:

if t <4 then 100else1
|
1 |
A-------B-------C
\ /
2 \ / 2
\ /
D

Suppose edge weights are "time in hours" and we're looking for the shortest-time route from A to C. Note that the time taken for edge BC is dependent on the arrival time at B. Dijkstra will favour getting to B directly from A, meaning we will arrive at B at t=1. Now, because t < 4 when we get to B, the edge BC takes 100 hours so we arrive at C at time 101. If you "run Dijkstra by hand" on this you'll find that the algorithm does not consider getting to B via D, since that would take 4 hours and we already have a path to B that takes only 1 hour. The consequence is that the algorithm cannot find the optimal path ADBC, which, because we get to B at t=4, would only take 5 hours.

I'm really enjoying our conversation. And I loved the sample you've sent. I will use it to show how I see the problem.

Let's consider that the distances between all nodes are contant equal 100 miles.

As you said the values represent the time in hours between the nodes. So to node A to B we are going in 100m/h, in nodes A to D and D to B at 50m/h. In node B to C during the first 4 hours we travel at 1m/h (maybe due a traffic accident) after that we go again at 100m/h.

We really never stoped moving.

So... If we come from A to B we do not stop at B and wait 3 hours. Actually we continue to C for 3 hours at 1m/h after 3 miles the traffic is liberated and we go the rest of the 97 miles at 100m/h... (0,97 hours or 58 minutes and 12 seconds). The total time is 4 hours 58 minutes as 12 seconds.

Now... if we come to B from D. We have spend 4 hours to get there and we spend another hour to go to C with a total of 5 hours.

So the fastest path is A -> B -> C with a final result of 4,97 (wich is less than 5 hours going by the other path). Remember that Time and Speed are a continuous functions so the code must be adapted the function to consider that and server the right values to the Djikstra engine.

Now. If you want to express that the subject will stop at the B node until the proposed time has passed it will be necessary to change the set of rules to:

I see now what you mean. Yes, it's a difference in how the problem is modelled. You are considering speed on a given road to be a function of time, whereas I was considering edge lengths to be a function of time. For convenience I'll call your type of graphs "variable-speed graphs" and mine "variable-edge-length graphs".

I haven't been able to come up with an example of a variable-speed graph that Dijkstra fails to find the optimal path for. As far as I can tell, for variable-speed graphs, just like for plain ordinary constant-edge-length graphs, it is always best to arrive at a vertex as early as possible, just as Dijkstra does. In fact I suspect this could be (or maybe has been) formally proved for this class of graphs. My reasoning: it seems to me that if you take a slower-than-optimal path to some intermediate vertex, someone who took the optimal path would already be ahead of you on whatever path you would continue to take to get to the destination, and can never be "overtaken." Alas I'm not smart enough to attempt a rigorous proof :-/

The second point I want to make is that type of problem you are modelling with your variable-speed graphs is a subset of the type of problem I was modelling with variable-edge-weight graphs. That's because every graph having speeds on edges as a function of time can be converted to an equivalent graph whose edge lengths are functions of time -- but not vice versa (actually the "but not vice versa" part is just a conjecture at this point, but note that disproving it would mean that in fact Dijkstra can fail on variable-speed graphs too). I.e. every graph you describe according to your approach can be converted to a graph I describe according to my approach, but the reverse is not true.

What this all comes down to is that the situation you are modelling with your variable-speed graphs is not the full class of graphs whose edge weights are arbitrary functions of time, but rather a more restricted class of graphs. Which is fine (and I happen to think your approach is a lot more realistic for e.g. modelling traffic flow), but I would nevertheless like to emphasise that there are instances in the larger class of variable-edge-weight graphs for which Dijkstra fails (my counterexample from before is one such example). So I would like you to update your article to clarify that Dijkstra may be able to solve to optimality certain restricted classes of graphs in which edge weights vary as a function of time, but not all such graphs. In effect Dijkstra places restrictions on the type of time-dependence that an edge's weight can have.

In terms of the population of programmers, the percentage of people with computer science degrees (and hence the predominant exposure to Dijkstra) is about 40% (according to Steve McConnell: http://www.stevemcconnell.com/psd/04-SoftwareEngineeringNotCS.pdf). And my experience with most Computer Science majors is that unless they have taken a course covering Dijkstra in the last year or used the algorithm, they've don't even remember any details about it.

So that said: this is advanced for most of those who read Code Project.

The follow up discussion between you two is excellent, just wanted to throw that out.

Well, I do not even want to mention the amount of literature one can read about how to implement Dijkstra. Anyway, just in case, here is a link to a page of mine where I collected three c# implementations using different data structures for the PQ: http://astarte.csr.unibo.it/Csharpsnippets.htm[^]. Better ones exist.