Shortest Paths in Weighted Networks

Shortest paths or distances among nodes has long been a key element of network research. While the shortest paths often are not of interest in themselves, they are the key component of a number of measures. First, a popular question has been what is the average distance among people? The question became famous following Milgram’s six-degrees of separation experiment in 1967 that found that people in the US were on average 6-steps from any other person (see the introduction of my thesis for a more complete story and criticism). Second, the shortest distances have been used as an underlying metric in a number of measures, such as centrality ones (Freeman, 1978; Opsahl et al., 2010). One of these measures is betweenness, which is roughly equal to the number of shortest paths between others that pass through a node. This page is aimed at showing a generalisation of shortest path calculations for weighted networks that has the potential of being more accurate.

The shortest distance among nodes in a network is quite easy to calculate if you only have present or absent ties: you simply count the ties along the shortest path. If two nodes are directly connected: distance=1; and if they are not directly connected, but are connected through intermediaries, then it is the lowest number of intermediary nodes +1. For example, if the ties in this network did not have weights, the distance between node A and F would be 3. Below is a matrix showing the shortest distances among the nodes in the sample network.

The average shortest distance in this network is 1.8 (the mean of the matrix, diagonal excluded).

Difficulty occurs when ties are differentiated, as they are in a weighted network. For example, the tie between node A and B has twice the strength of the tie between node A and node C. This could mean that node A has more frequent contact with node B than with node C. In turn, this could imply that node A might give node B a piece of information (or a disease) twice as likely as node C. If we are looking at the diffusion of information or diseases in a network, then the speed that it travels, and routes that it takes, are clearly affected by the weights.

Let us look at the shortest path from node C to node B. The direct connection carries a weight of 1 only; however, the indirect connection through node A is composed of stronger ties. Therefore, a piece of information (or a disease) might be transmitted quicker through node A than directly.

Dijkstra (1959) proposed an algorithm that sum the cost of connections and find the path of least resistance. For example, GPS devices uses this algorithm by assigning a time-cost to each leg of the road, and then find the route that cost least in terms of time. This algorithm can also be used in social network analysis. Newman (2001) applied it to a collaboration network of scientists by inversing the tie weights (dividing 1 by the weight). This implies that a stronger tie gets a lower cost than a weaker tie. Below is the network from above with the inverted weights:

By applying Dijkstra’s algorithm, we find that the direct connection between node C and node B has a cost of 1, whereas the indirect connection via node A has a cost of 0.75 (1/2 + 1/4). Therefore, according to this algorithm, the information will travel faster through the indirect connection. Below is a matrix showing the cost of the least costly routes among the nodes in the sample network.

A unit of distance then refers to one step with the average weight in the network. The average of the matrix suggests that on average nodes are 2.29 steps with average tie weight away from each other. This measure would be comparable across networks with different ranges of tie weights.

If anyone has any suggestions for improvements, let me know!

Want to test it with your data?

This distance matrix can be calculated using tnet. First, you need to download and install tnet in R. Then, you need to create an edgelist of your network (see the data structures in tnet for weighted one-mode networks). The commands below show how the edgelist for the sample network here can manually be entered, and how to calculate the distance matrix.

If you use any of the information in this post, please cite: Opsahl, T., Agneessens, F., Skvoretz, J., 2010. Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks 32 (3), 245-251

First of all, thank you very much for providing all these helpful SNA resources on your blog!

I have a question concerning the tnet package.
While the command “closeness_w” returns the “raw” closeness score as well as the normalized score, the command “betweenness_w” only gives one kind of score (not normalized, I guess).
Is there a specific reason why normalization is only available for closeness centrality?

Thank you for your comment. I am not a big fan of normalized indices as the only purpose of them is to compare nodes’ scores across networks where the number of nodes and ties vary, and to do so, it makes a couple of assumptions (e.g., scaling with n*(n-1)). This might not be appropriate. If you would like to scale the network metrics, you can simply apply the method you would like.

Thank you for your reply! So I guess if I’m just looking at one single network (no comparisons) I can just go with the non-normalized scores of betweenness and closeness.
BTW I’m sorry for my redunant comments. The first one wasn’t displayed to me until just now, so I thought something had gone wrong and posted a second time. Please feel free to delete one of them.

5.Ali | February 16, 2014 at 4:42 pm

very good article.
Do you know any software/method can calculate the closeness centrality for a weighted directed network based on the Dijkstra’s algorithm? the commands you presented here is for unweighted network.

First, let me thank you for this very helpful blog; I found many of the entries to be highly informative and useful.
I am using iGraph for my calculations and I find the weights issue to be a bit confusing, especially after reading this entry and trying to calculate average shortest path length for networks. I assumed that in igraph the weight of an edge is a measurement of tie strength. Therefore, higher values indicate stringer connection. However, is seems that when using Dijkstra’s algorithm in Igraph, the package treats the weights of the edges as distances, looking for paths that minimize the sum of all paths between two nodes. This seems to indicate that the algorithm sees lower weights as stronger ties (ie shorter distances). My solution will be, just for this calculation to convert every edge weight x in my edge weights list to 1/x (as with the Newman reference you provided).

Did the problem I observed is right and do you think that solution a good approach?
Additionally, is this a problem only in this case (and centrality measures) but not an issue in the calculation of transitivity and modularityin igraph (for example, when using random walks).

Indeed the assumption that tie weights represent costs instead of strength is often leading to some incorrect findings. The distance_w-function in tnet takes the inverse of the tie weights and then utilize the igraph implementation of Dijkstra’s algorithm. I also scale the weights with the mean tie weight so that one can more easily interpret the values. I am not sure how igraph implements transitivity and modularity, so you might reach out to them directly. All the tnet functions work on the assumption of high values equal stronger connections.

First thanks for a lot of useful information provided here. I had 2 questions regarding the average shortest path in weighted graph, particluary if there’s a similar way to compute Diameter of graph and also to display a distribution of shortest paths (in a way like “what is the probability of choosing a path with a certain distance if picking randomly?”)

The diameter of a graph is simply the longest path in the network, and the probability is found by studying the non-diagonal values in the distance matrix. The code below shows how to compute both (including a plot of the probability of randomly picking a path with a particular length or less).

I’ve recently come about 2 uncertaineties. I’ve analyzed a graph consisting of 157 nodes. When i compute a matrix of weighted distances (shorthest paths) between nodes my matrix is 141×141 not 157×157 (as I understand it, this is mainly because of existing separated dyads (pairs of 2 nodes connected only to each other and disconnected from all other nodes in graph). There are 7 of these separated pairs in my graph. Could this be a reason for recieving 141×141 matrix and not the 157×157 matrix?

Secondly. If i’m looking at the specific value of distance in matrix lets say row = 28 and collumn = 31, is this value a distance between nodes (28-31) that were originaly inputed in the analyzed dataframe? In other words, is row number 28 representing node number 28 in original data?

1) The distance matrix is only from the main / giant component of the network. If you would like it for all nodes, you must specify gconly=FALSE in the distance_w-function.

2) The node ids of the rows and columns in the distance matrix when gconly=TRUE is saved as an attribute of the output from the distance_w-function. For example, if the output object is called dmat, the node list can be found by typing attributes(dmat)$nodes.

Here’s my problem. Nodes in my graph represent animal names, produced in category/semantic fluency test. 2 most strongly connected nodes are “cat” and “dog”. I cannot think of any other shortest path between these two nodes than the direct one, as this is the path with highest weight in graph. Node “cat” was numericaly labeled as 1 and node “dog” as 2. When i lookup shorthest path between 1 and 2 in dmat matrix the value is 2.74 and this doesn’t make any sense to me.

I’m definitely missing something, would be glad for any further guidance.

Please send me an email with your data and code, and I will have a look at what is going on.

Tore

16.Drahomir | April 17, 2016 at 10:47 am

Hi, I’m really appreciating your willingness to help, but I’ve already figured it out. Error was in my data input, which didn’t include raw weights but already normalized, so they were normalized once again by the package.

17.Elio Bartos | April 1, 2016 at 2:33 pm

Hello,
thanks for this post, I’m just getting started with SNA and tnet package.
1. Is there a way to calculate shortest distances like in example 1. (without normalisation) with tnet package?
2. Does measures like closeness use normalised shortest distances or just shortest distances?

1) The “normalization” of ties weights is a constant applied to all weights. You can undo this normalization with the mean time weight. Have a look at the function by typing distance_w without the (). You can then copy this code and comment out the normalization line.

2) The functions rely on the normalized distances. But it’s key to remember that the same paths are identified as the shortest ones with and without normalization.

Hello Tore
I am using tnet package for my weighted network analysis. I am little bit confuse for average path length of disconnected graphs. After breakdown my network divided into 10 part, where i am considering the sub graph have more than 5 nodes, most of the algorithm is to calculate the mean path length by considering the number of nodes from giant component only. But in a power grid network , each sub graph is a part of the network even if there is no link for a certain period of time, so i wish to consider all the nodes present in disconnected sub graph, and wants to calculated average path length based on total nodes, am i right or wrong?

First of all, many thanks for tnet and all the information in your blog.

Sorry, I am having a problem when trying to get distances in a weighted directed network. I am inputting data that contains 304 nodes and 710 edges. 300 nodes are connected to the giant component and only 4 are not. I was expecting to get a matrix of 300*300 (or 304*304) but I am only getting 66*66 (when using “gconly=TRUE”). I tried to write the “distance_w” function with “gconly=FALSE” but I get a message saying that R has encountered a fatal error and it needs to shut down. Any clue of what I might be doing wrong? (My weights take the values of 1, 2, 2.5 and 3 – So not sure if the ‘.5’ is creating an issue)

Hi, thanks for your good article. I just have question. Suppose there is an tie with weight 2 between two nodes A, B. However, there is another path from A to B like this: ACDEB with weight 3 for each tie [(A,C)=3, (C,D)=3, (D,E)=3, (E,B)=3)]. As you said the strongest path would be better path. So the algorithm must return second path as the solution but it does not, because the first path will have lower value 1/2 while the second one has the value (1/3+1/3+1/3+1/3=4/3). I was confused by this. I will appreciate you if give me an answer.

The distance is the sum of inverse tie weights, so an indirect path consisting of two ties with weights of 10 would be preferred over a direct path with a tie weight of less than 5 as 1/10 + 1/10 < 1/4.

Thank you so much for these posts! I’ve found them very helpful as I am doing research for my senior thesis. I was hoping to calculate in-closeness and out-closeness centrality and was wondering if there was a way, either in tnet or some other package, that you were aware of for calculating these measures? I’ve been reading a paper by De Benedictis et al., called “Network Analysis of World Trade using the BACI-CEPII Dataset” and they were able to implement it in the tnet library. I’d appreciate any ideas you may have.

Specifically, you can compute the standard closeness measure (ie inverted sum of distances to all other nodes) by running closeness_w-function on the network. If you want the reverse closeness measure (ie inverted sum of distances from all other nodes), then you can precompute the distance matrix using distance_w, transpose the matrix, and use this matrix in the closeness_w-function (precomp.dist-parameter).

Additionally, I think I am still a bit confused as to how closeness_w uses distance_w in its calculations. I tried calculating closeness_w by hand for the example network given here (https://toreopsahl.com/tnet/weighted-networks/shortest-paths/) but am not getting the same numbers that are computed in R. Would you mind sharing how exactly closeness_w is calculated?

first of all I’d like to also thank you for your very helpful and interesting approaches on SNA.
If have one question and I’d very grateful if you could help me in this case.
I applied the distance_w function on two different network as described above. Unfortunately, in one case the result was NaN but I was not able to find out why. Both networks are significantly different in terms of their size. The network with the result NaN has very strong betweeness nodes. Could that be a reason for the result? Or is there a way how to interpret NaN as a result in this specific case?