Centrality and AIDS

Stephen P. Borgatti
University of South Carolina

Centrality measures are commonly described as indices of prestige, prominence,
importance, and power -- the four Ps. However, this sort of interpretation seems
inappropriate in the case of sexual networks. In this column, I consider the
interpretation of centrality measures in sexual networks, and more generally in the
context of any kind of network diffusion.

For simplicity, I will assume that the data consist of a discrete(1)
symmetric social relation such as "has sex with", which we can represent as a
binary adjacency matrix X in which aij = aji = 1 if actor i has sex with actor j and aij =
aji = 0 otherwise. [In general, the same interpretations will hold for nonsymmetric data
if we consider only incoming ties.]

Four measures of centrality are commonly used in network analysis: degree, closeness,
betweenness, and eigenvector centrality. The first three were described in modern form by
Freeman (1979) while the last was proposed by Bonacich (1972). Let us begin with degree.

Degree centrality may be defined as the number of ties that a given node has. More
precisely, the degree of node i is given by:

In the case of the "has sex with" relation, the degree of an actor is simply
the number of people she has sex with.

Suppose the probability of something occurring to an actor is a function of the number
of exposures she has to it. For example, suppose the probability of adoption of an
innovation is a function of the number of friends that a person witnesses adopting. Then
we can interpret degree centrality in a friendship network as an index of the probability
that they will adopt the innovation. All else being equal, then, we can describe degree
centrality as measuring the risk (or opportunity) of receiving whatever is flowing through
the network.

Eigenvector centrality is best understood as a variant of simple degree. Alexander
(1963) had the idea that it wasn't just how many people a person knew that counted, but
how many people the people that they knew knew. In the context of HIV transmission, we can
see that all else being equal, a person A with one sex partner has a better chance of
escaping the infection than a person B with many. But if that one sex partner that A has
is having sex with most of the network, A's chances of getting infected are nearly as good
as if she were having sex with all those others herself. This idea was anticipated by Katz
(1953) and further developed by Hubbell (1965) and many others, finally culminating with
Bonacich (1972) who defined centrality as the principal eigenvector of the adjacency
matrix. An eigenvector of a symmetric square matrix A is any vector e which satisfies the
equation

where is a constant (known as the eigenvalue) and ei gives the centrality of node i.
The formula implies (recursively) that the centrality of a node is proportional to the sum
of centralities of the nodes it is connected to. Hence, an actor that is connected to many
actors who are themselves well-connected is assigned a high score by this measure, but an
actor who is connected only to near isolates is not assigned a high score, even if she has
high degree.

In the context of the risks or opportunities that a node faces to receive
that-which-diffuses through the network, Bonacich's eigenvector centrality can be
interpreted as a refined version of degree. Indeed, it can (generally) be computed by an
iterative degree calculation, known as the accelerated power method (Hotelling 1936). The
algorithm is as follows:

Note that after executing step 1 the first time, e* is equal to simple degree.

Closeness centrality may be defined as the total graph-theoretic distance of a given
node from all other nodes. More precisely,

where dij is the number of links in the shortest path from actor i to
actor j. Closeness is an inverse measure of centrality in that a larger value indicates a
less central actor while a smaller value indicates a more central actor.

In the context of network diffusion, I interpret closeness as an index of the expected
time-until-arrival at a given node of whatever is flowing through the network. To
illustrate, suppose an infection enters a network at node p. Suppose also that it takes
one unit of time to traverse a link. If we assume that the infection always travels along
the shortest possible route(2), it will reach a given node q in dpq
units of time, where dpq is the number of links in the shortest path from p to q. Now if
the infection is equally likely to originate with any node in the network, then closeness
centrality, ci is proportional to the expected number of time units
until arrival of the infection at a given node.

Admittedly, few scientists view viruses as smart bombs programmed to find the shortest
path to specific victims. In fact, the only way that a virus could take the shortest path
to every node in any network is if, from any given node, it replicated and moved
simultaneously to all adjacent nodes (something like news being relayed by ham radio
broadcasts). However, my task here is not to find the best possible model of disease
spread, but to interpret an existing model. Besides, in practice, the length of the
shortest path between two nodes is highly correlated with the average length of all paths
between the nodes. If the virus is equally likely to trace each possible path, closeness
will not usually be far off the mark.

Betweenness centrality is more difficult to understand in this context. It may be
defined loosely as the number of times that a node needs a given node to reach another
node. More precisely (but not quite correctly), it is the number of shortest paths
(between all pairs of nodes) that pass through a given node. It is exactly defined as

where gij is the number of shortest paths from node i to node j, and gikj is the number
of shortest paths from i to j that pass through k. The purpose of the denominator is to
provide a weighting system so that node k is given a full centrality point only when it
lies along the only shortest path between i and j. If there is another equally short path
that k is not on, k is given only a half a point, on the theory that the path that k is
along has only a 0.5 chance of being chosen.

Betweenness indexes the extent to which a node's presence facilitates the flow of
that-which-diffuses. If a node that is high on betweenness centrality is removed from the
network, the speed and certainty of transmission from one arbitrary point to another are
more damaged than if a node low on betweenness is removed.

Granovetter (1973) makes the same point about bridges and local bridges (which are ties
rather than nodes). When a local bridge is removed, the nodes on either side of the bridge
become reachable from each other only via very long paths. It is because Granovetter
believes that only weak ties can be bridges that he makes his famous claim asserting the
strength of weak ties. Actors with many weak ties are more important than others because
removing those actors would do the most damage to transmission possibilities throughout
the network.

To my mind, betweenness would clearly be a better way than counting weak ties to assess
an actor's importance in the diffusion process -- if only that-which-diffuses did so in
such a way as always travel along the best possible route.

Discussion.

While my main purpose has been simply to interpret existing centrality measures in the
diffusion context, it is hard to ignore the fact that some of these existing measures
(e.g., closeness, betweenness) do not seem up to the task. They work well if
that-which-diffuses moves in all directions at once (thereby reaching every node via the
shortest possible path), but not if it moves in just one direction at a time (e.g., the
actors have sex with just one person at a time).

The obvious solution is to construct variants of closeness and betweenness that count
all paths rather than just shortest paths. I am persuaded by Friedkin (1991) that
off-the-shelf centrality measures are not as useful as measures custom-made for the
particular empirical phenomenon of interest.

In constructing such measures, we might want to consider counting trails rather than
strictly paths. A graph-theoretic path is a sequence of nodes in which no nodes are ever
repeated. A trail is a sequence in which individual nodes may be repeated, but no adjacent
pairs of nodes are repeated. For example, we might use trails to model the movement of a
bit of gossip, because the same news may reach a given actor more than once from different
source, yet it is unlikely (Alzheimer's aside!) that they would hear the story from the
same person again.