Closeness centrality in networks with disconnected components

A key node centrality measure in networks is closeness centrality (Freeman, 1978; Opsahl et al., 2010; Wasserman and Faust, 1994). It is defined as the inverse of farness, which in turn, is the sum of distances to all other nodes. As the distance between nodes in disconnected components of a network is infinite, this measure cannot be applied to networks with disconnected components (Opsahl et al., 2010; Wasserman and Faust, 1994). This post highlights a possible work-around, which allows the measure to be applied to these networks and at the same time maintain the original idea behind the measure.

This network gives a concrete example of the closeness measure. The distance between node G and node H is infinite as a direct or indirect path does not exist between them (i.e., they belong to separate components). As long as at least one node is unreachable by the others, the sum of distances to all other nodes is infinite. As a consequence, researchers have limited the closeness measure to the largest component of nodes (i.e., measured intra-component). The distance matrix for the nodes in the sample network is:

Nodes

All inclusive

Intra-component

A

B

C

D

E

F

G

H

I

J

K

Farness

Closeness

Farness

Closeness

A

…

1

1

2

2

3

3

Inf

Inf

Inf

Inf

Inf

0

12

0.08

B

1

…

1

2

1

2

3

Inf

Inf

Inf

Inf

Inf

0

10

0.10

C

1

1

…

1

2

2

2

Inf

Inf

Inf

Inf

Inf

0

9

0.11

D

2

2

1

…

2

1

1

Inf

Inf

Inf

Inf

Inf

0

9

0.11

E

2

1

2

2

…

1

3

Inf

Inf

Inf

Inf

Inf

0

11

0.09

F

3

2

2

1

1

…

2

Inf

Inf

Inf

Inf

Inf

0

11

0.09

G

3

3

2

1

3

2

…

Inf

Inf

Inf

Inf

Inf

0

14

0.07

H

Inf

Inf

Inf

Inf

Inf

Inf

Inf

…

1

2

Inf

Inf

0

3

0.33

I

Inf

Inf

Inf

Inf

Inf

Inf

Inf

1

…

1

Inf

Inf

0

2

0.50

J

Inf

Inf

Inf

Inf

Inf

Inf

Inf

2

1

…

Inf

Inf

0

3

0.33

K

Inf

Inf

Inf

Inf

Inf

Inf

Inf

Inf

Inf

Inf

…

Inf

0

0

Inf

Although the intra-component closeness scores are not infinite for all the nodes in the network, it would be inaccurate to use them as a closeness measure. This is due to the fact that the sum of distances would contain different number of paths (e.g., there are two distance from node H to other nodes in its component, while there are six distances from node G to other nodes in its component). In fact, nodes in smaller components would generally be seen as being closer to others than nodes in larger components. Thus, researchers has focused solely on the largest component. However, this leads to a number of methodological issues, including sample selection.

To develop this measure, I went back to the original equation:

where is the focal node, is another node in the network, and is the shortest distance between these two nodes. In this equation, the distances are inversed after they have been summed, and when summing an infinite number, the outcome is infinite. To overcome this issue while staying consistent with the existing measure of closeness, I took advantage of the fact that the limit of a number divided by infinity is zero. Although infinity is not an exact number, the inverse of a very high number is very close to 0. In fact, 0 is returned if you enter 1/Inf in the statistical programme R. By taking advantage of this feature, it is possible to rewrite the closeness equation as the sum of inversed distances to all other nodes instead of the inversed of the sum of distances to all other nodes. The equation would then be:

To exemplify this change, for the example network above, the inversed distances and closeness scores are:

Nodes

Closeness

A

B

C

D

E

F

G

H

I

J

K

Sum

Normalized

A

…

1.00

1.00

0.50

0.50

0.33

0.33

0

0

0

0

3.67

0.37

B

1.00

…

1.00

0.50

1.00

0.50

0.33

0

0

0

0

4.33

0.43

C

1.00

1.00

…

1.00

0.50

0.50

0.50

0

0

0

0

4.50

0.45

D

0.50

0.50

1.00

…

0.50

1.00

1.00

0

0

0

0

4.50

0.45

E

0.50

1.00

0.50

0.50

…

1.00

0.33

0

0

0

0

3.83

0.38

F

0.33

0.50

0.50

1.00

1.00

…

0.50

0

0

0

0

3.83

0.38

G

0.33

0.33

0.50

1.00

0.33

0.50

…

0

0

0

0

3.00

0.30

H

0

0

0

0

0

0

0

…

1.00

0.50

0

1.50

0.15

I

0

0

0

0

0

0

0

1.00

…

1.00

0

2

0.20

J

0

0

0

0

0

0

0

0.50

1.00

…

0

1.50

0.15

K

0

0

0

0

0

0

0

0

0

0

…

0

0

As can be seen from this table, a closeness score is attained for all nodes taking into consideration an equal number of distances for each node irrespective of the size of the nodes’ component. Moreover, nodes belonging to a larger component generally attains a higher score. This is deliberate as these nodes can reach a greater number of others than nodes in smaller components. The normalized scores are bound between 0 and 1. It is 0 if a node is an isolate, and 1 if a node is directly connected all others.

This post is the explaination of a footnote the node centrality paper. If you use any of the information in this post, please cite: Opsahl, T., Agneessens, F., Skvoretz, J., 2010. Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks 32 (3), 245-251

What you are talking about is the normalisation of closeness scores. A normalisation procedure is simply ensuring that scores are bound between 0 and 1. If you divide positive scores by its theoretical maximum, you will achieve this.

I am not a fan of normalisation as (1) it does not increase the variance among scores if you only analyse one network or networks of similar size (i.e., multiplying all scores with a constant), and (2) it is questionable whether the sum of all distances scale linearly with the number of nodes (see the small-world literature on this topic). As a result, I have not used normalised scores.

Hi Tore,
tnet outputs the normailzed closeness as well, however the tutorial mentions that the output is a data.frame with two columns, node ids and closeness scores. Can you please just indicate in the tutorial that a third column (n.closeness) is output as well?

The third column in the normalised closeness scores (i.e., the closeness scores divided by n-1). This column is only added when gconly=FALSE. But there is no reason why it is not computed when gconly=TRUE. Will add this in the upcoming version of tnet, and change the manual. Thanks for noticing.

I am comparing two networks of slightly different sizes (n=21 & n=19) and would like to normalize the closeness scores to facilitate this comparison. Since the networks are very similar in size, I don’t think I have to worry about small world scaling issues. My question has to do with the normalized closeness data. When tnet outputs closeness alpha=0, the normalized values are bounded between 0 and 1 as expected. However, if I run closeness with alpha=0.5 or 1, the normalized values exceed 1 (I get values up to 1.29). This is driven by nonnormalized closeness values that exceed n-1. For example, in one case I have n-1=20 and one node with a closeness score of 24.5 (when alpha=0.5). Does your normalization procedure only apply to closeness when using alpha=0? Could you suggest a way to normalize closeness for alpha=0.5?

The non-alpha=0-measures do not have a fixed maximum. As such, it is difficult to normalise the measures. Unfortunately, I do not know of a way to normalize the non-binary scores. If you find one, do let me know!

Best,
Tore

7.sadia shah | April 19, 2011 at 9:29 am

Tore,

I am using this approach for a directed network….and i come across cases where a node X cannot be reached by another node Z because although connections between intermediate nodes (say Y) exist but not in both directions…shall i consider that the distance X and Z will be infinity?
i m waiting for a quick reply :-)

The traditional closeness measure requires all nodes to be mutally reachable. The above procedure does not have this requirement.

The distance from one node to another in a directed network might be different from the distance from the latter to the former node. The distance calculation in a directed network generally assumes that paths follow ties direction (e.g., if a has a tie with b, and b has a tie with c, the there is a path from a to c, but not from c to a). The distance_w and closeness_w-functions in tnet use this procedure.

Thank you for noticing this comment and replying to it so quickly:)…Yes it did help…..

I need one further guidance related to the dataset i am using. it is an email network which is weighted,directed and has disconnected components…….I have some email sender nodes but their recipients are missing………
for example node X send 2 or say 3 very important emails but i do not know who were the recipients……Of course i can not deny their existance………..what could be done?

An always interesting, but sometimes forgotten concept in network analysis, is the boundary of the network. Unfortunately, few, if none, network measures are able to incorporate missing nodes. Let me know how you deal with this issue.

I have a small issue…….while calculating the average closeness of all nodes, can i remove nodes having 0 closeness with the rest of the network by considering them to as isolated nodes? e.g. from the above network, can i remove node K while finding average?

If you save the output from the closeness_w-function as an object called out, then you can extract the rows of out where closeness is greater than 0, and calculate the mean of the closeness column. Below is some sample code that could replace the last line in the code in the blog post.

By removing the nodes with a score of 0, you will increase the mean. However, this is more a question of the boundary of the analysis/network. Should isolates be included? If yes, then the 0 scores should be included. If not, then they should be removed.

Thank you for guiding me to this article. It is very interesting how they created a unifying small-world measure. This is something I have been thinking about for quite some time.

In this post, I focused on centrality, or more specifically, node closeness scores. You are absolutely right that the inverse of geodesic distances were also taken in Latora and Marchiori (2001); however, they did so from a different background (small-world literature) to reach a very different outcome (i.e., understanding the overall function of the network). The path of research that I was following originated with Freeman’s (1978) work on centrality. In fact, it is worth noting that the terms closeness and centrality are not even mentioned in Latora and Marchiori (2001).

The proposed measure by Latora and Marchiori (2001) enables an assessment of the connectedness of a network. Although I don’t think that the normalisation using n*(n-1) is appropriate as the small-world literature has told us that geodesic distance does not scale with n-squared, it does show how a measure to test for the existence of a backbone in networks could be created. In fact, it is exactly this where I believe the paper is contributing to the literature.

Hi Tore,
In R’s {sna} package, closeness centrality offers the formula you suggest–of obtaining the inverse of distance to other nodes before summing them. They attribute this formula to
Gil and Schmidt (1996). see http://www.inside-r.org/packages/cran/sna/docs/closeness
Thought you might like to know.

Thanks for this reference! There are many implementations of similar work-arounds for this issue. I am unable to get a hold of Gil and Schmidt’s Sunbelt presentation from 1996, but it does not seem to be proposed in Gil, Schmidt, Castro, and Ruiz paper in Connections in 1997 with a similar title as they do not deal specifically with disconnected networks. Glad to attribute them here.

Right. I can’t access the 1996 conference paper either; just based my comment on the R {sna} package documentation… searched for Gil and Schmidt closeness centrality and came upon Sinclair’s article: http://www.sciencedirect.com/science/article/pii/S0378873306000116 –not sure if you have access to it). He describes G & Sch’s power centrality index as “comparable with the closeness centrality index in that it uses distances from the indexed vertex to other vertices in the calculation” ( p. 81-82)
So, hard to tell whether perhaps in their presentation, G & Sch more explicitly made a connection between their index and closeness centrality, or whether the R sna alternative for closeness was inspired by G & Sch.
cheers,
Peyina

24.Tyler Creech | August 18, 2012 at 12:55 am

Hi Tore,

I have a question about the closeness_w function. I am trying to use this function to assess the relative influence of edges in a weighted, disconnected network, by removing one edge at a time and calculating the mean weighted closeness across all network nodes. Presumably, the edges whose removal results in the largest decrease in mean closeness are the most influential.

I have found that there are a couple edges in my network whose deletion actually causes a slight increase in the mean weighted closeness (without any changes to nodes). Do you know how this could be possible? I am using the gconly=FALSE option and alpha=1 for Dijkstra’s algorithm. I can’t see how removing any edge could increase closeness – at worst, it seems like it would have no impact, if the deleted edge wasn’t part of any shortest paths. Is this perhaps some sort of scaling issue? It makes no difference whether I use the normalized values (i.e., divided by N-1) or not, but maybe there is some additional standardization within the function that I’m not aware of?

Thanks for your help, and for developing a great R package and website. I have found both to be tremendously helpful.

I have a suspicion that this might be due to changing network size (i.e., isolates at the end of the node id sequence are removed as the network is stored as an edgelist). If you email me the code and data, I will have a look.

This shouldn’t be the case. As you can see from the example, node 8 is missing in the edgelist, and gets a closeness score of 0 in the output when gconly is set to TRUE. Using the closeness-function requires an N by N distance matrix to be calculated. This will be a memory issue when you have 88,000 nodes…

Glad you are using tnet. It is possible to use a five-digit identifier; however, this will create much larger output objects. You might want to run the compress_ids-function first on the data. If this doesn’t help, please email me the code and data that you are using, and I will have a look.

Sorry, somehow the code I inputted disappeared during posting. I tried your example as below:
It seems that the algorithm is doing the inverse(sum(distance)) instead of the sum(inverse(distance))
Could it be that the function was changed at some point? Thanks!

Thank you for discovering this bug. There seems to have been a recent update that broke it. I have updated the code, and will publish a new version of tnet. In the meantime, send me an email, and I can send you the code.

Hi Tore,
I send you my databank on your personal address. You may have received it as “spam”. It doesn’t matter. I have just one question. In the case of directed networks, how can I use the option type “in” or type “out” for the closeness indicators. While this option works for the degree it does not work for the closeness.
Do you have an idea how to solve this problem ?

I want to use your closeness centrality in networks with disconnected components. Do you have an article published with it or should I cite this website? I checked your 2010 paper but the algorithm is different.
Thanks,
Tania

Hi Tore,
I got kind of confused about reading the closeness outputs.
I used different alpha values for comparing outputs.
Which one is the weighted closeness score for each alpha- closeness or n.closeness ?
Cause depending on the alpha value, these two scores keep changing.
Thank you!

Hi Tore,
I have the same problem with Leila
” In the case of directed networks, how can I use the option type “in” or type “out” for the closeness indicators. While this option works for the degree it does not work for the closeness.
Do you have an idea how to solve this problem ?”
Your function calculates the “outcoming” paths of a node. What can i do if i am interested in closeness centrality as an “incoming” measure?

This would require transposing the distance matrix. You can calculate the distance matrix using the distance_w-function (eg dmat <- distance_w(net)), transpose it (tdmat <- t(dmat)), and then supply this matrix as the precomputed distance matrix to the closeness_w-function.

The things you said, it was something that i almost knew .I have changed the code and i found the result i wanted. I just wondered if there was a way to create this by simple changing one variable in the closeness_W function.

Thank you so much for replying
Giannhs90

48.Sean Everton | December 12, 2014 at 6:59 pm

Hello:

I just ran across this, so I apologize for coming a bit late to the part, but Borgatti (2006) — “Identifying sets of key players in a social network” — uses average reciprocal distance as an alternative closeness measure. It has also been implemented in UCINET for some time, possibly dating back to 2006 or earlier, but I don’t know for sure.

Thanks for this reference, Sean. There is a whole host of centrality metrics, and this site does not attempt to be a complete source. This post is simply highlighting that it’s possible to calculate closeness centrality on disconnected networks.

Hi Tore, I am using tnet for one of my project using tnet. While implementing it, I was unclear about the interpretation of it for weighted network. I was wondering if closeness(normalised) is close to 1 does it mean that the node is more central than a node with value less than 1. How does the correlation works out to be with degree as in, if a node has very high degree, will the closeness be also high. Thanks for clarification.

I am new to the area of large networks. I have been reading on centrality measures. And came across your article. I was wondering if the closeness centrality measure is similar to median computation in graph theory.
Median problem is a very important facility location model. Have you come across any paper on median computation in large networks? If you have, please suggest some references.

How can I calculate the “intra-component” closeness centrality (of networks with two components) in R?
R said me: “[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0” if i want to calculate the closeness centrality of a network with two components.

You can extract the components using a package like igraph, and then run the closeness_w-function. I would advice against it though. The values in a smaller component will be on average closer than those in a large component simply because there are fewer paths. I would rather suggest setting the gconly parameter to FALSE instead. See the post above.