Abstract

A network structure metric is herein suggested for the investigation of the behaviour of epidemic spreading processes in general network-structured populations. This simple measure, based on the algebraic powers of the adjacency matrix associated with the network in question, is shown to admit a heuristic interpretation as a representation of a spreading process similar to standard epidemic models. It is further shown that the values of this metric may be of use in understanding the dynamic pattern of epidemic spread on networks of greatly varying structural properties (e.g. the degree distribution, the assortativity/dissortativity and the clustering).

Keywords:

1. Introduction

The last decade has seen an explosion of interest in the properties of ‘complex networks’ (Albert & Barabasi 2002; Boccaletti 2006; Newman et al. 2006). The origin of the new focus may be traced essentially to two papers presenting the canonical examples of complex networks: first, the small-world (SW) networks of Watts & Strogatz (1998), and second, the scale-free network model of Barabasi & Albert (1999). These networks are considered ‘complex’ particularly in comparison with the Erdos–Renyi (ER) random graphs (Bollobas 2001), which have a somewhat longer history in the literature.

Nevertheless, the explicit incorporation of an underlying ‘microscopic’ network structure into epidemic models has yielded some very interesting connections and results. Grassberger (1983) compared epidemic spreading on a lattice network with percolation processes of interest to statistical physicists. More recently, Pastor-Satorras & Vespignani (2001a) showed that the standard picture of a minimum infectivity threshold required for epidemic outbreak does not hold for the models of infectious disease spreading on scale-free network topologies (this result has been recently extended to scale-free networks with clustering as well; Serrano & Boguna 2006).

The strategy for many theoretical network epidemiology papers is to suggest a specific (set of) algorithm(s) (or dynamics) for constructing the underlying networks, and then, either through simulation or analysis, to see how the process plays out on the resulting graphs (Andersson 1998; Watts & Strogatz 1998; Pastor-Satorras & Vespignani 2001b; Gross et al. 2006). Various structural metrics may be employed to offer insights into how varying some part of the network construction algorithm leads to a particular change in the pattern of spread of disease, but this is often not done in any systematic manner. That is, it is frequently possible to imagine other networks that yield the same values for the particular metric(s) employed, but which will yield dramatically different ‘spreading dynamics’. (Consider, for example, two networks with the same degree distribution but very different patterns of clustering.)

Alternatively, some papers develop an analytical framework for dealing with certain restricted classes of networks (generally those described either by the degree distribution only (Moreno et al. 2002; Newman 2002) or by the degree distribution and the degree–degree correlations, but no other features (Moreno et al. 2003)) without the need to specify exactly how such networks might be constructed. Still, the results connecting the input values of the structural parameters to the output measures of epidemic spreading on such networks are in these cases technically restricted to those special networks whose structures may be described entirely by the specific metrics considered (i.e. the degree distribution and the degree–degree correlations).

By contrast, we propose and explore a structure metric that is more directly relevant to spreading processes on general networks than the most commonly available alternatives. Also, it satisfies other useful properties, such as being computationally cheap to evaluate and admitting an intuitive interpretation.

2. Methods

2.1 Requirements for proposed metric

In order to give more precise meaning to our claim of defining a structure metric directly relevant to the behaviour of epidemic spreading processes on networks, it is necessary to consider what specific requirements must be imposed on such a hypothetical metric.

First, the metric should mimic the time course of spreading processes by consisting of a sequence of increasingly global values, loosely interpretable as the number of nodes (individuals) infected at successive time steps. In a finite network, this sequence should be subject to saturation at what is expected to be the upper limit to the size of the epidemic. Second, a versatile metric should be able to deal with complexities beyond the binary states of susceptible (S) or infected (I), for example, by including recovered/resistant states (R) that may or may not revert to susceptibility again (i.e. it would be useful if the measure could discriminate among the so-called SI, SIS, SIR and SIRS processes). Here, as a start, we focus our attention on a two-state model of the SIS type, but hopefully pointing the way to the development of measures useful for more complex problems.

Even restricting to SIS processes, one must still consider the continuum of the possible parameter values characterizing different SIS models. We confine our attention to two parameters: the transmission probability pI that a susceptible node is infected by an infected neighbour in any one time step, and the recovery probability per time step pS that an infected node recovers and reverts to susceptibility. Thus, we seek a metric that incorporates these two parameters, pI and pS, in an appropriate manner.

Finally, of course, the most important requirement must be the predictive power of the proposed metric when applied to epidemic spreading processes. For this paper, the processes to be predicted will be discrete time SIS simulations, as described in §2.6.

What sort of metric might satisfy these criteria? Given that the spreading phenomena must propagate along the paths present in the network, one might consider quantities derived from the numbers of distinct paths connecting all possible pairs of nodes. Unfortunately, these quantities are not particularly easy to calculate. However, a slight generalization from paths to walks—which differ from paths only in that they allow repetition of the same node at different steps—leads to significant mathematical simplification, as discussed in §2 in the electronic supplementary material.

2.2 Wellness and unwellness metrics

We hypothesize that the k-walk counts (Ak)ij from node i to j (for more details see §2 in the electronic supplementary material) provide useful information regarding the likelihood of transmission of infection from individual i to j over a k- and pS-dependent time period. Here, (Ak)ij signifies the (ij)th entry of the adjacency matrix A raised to the kth power (i.e. matrix multiplied by itself k times). As discussed in §2 in the electronic supplementary material, (Ak)ij is also equal to the number of distinct walks of exactly k steps starting at node i and ending at node j: hence, the description ‘k-walk counts’.

Noting that pS−1 defines the mean time for an infected individual to remain infectious before returning to susceptible status, it is natural to relate (Ak)ij to the level of disease transmission over k such ‘generation times’ pS−1. Thus, we seek an estimate for the probability pi(j;k) of individual j being infected at time step Tk(pS)=ROUND[k*pS−1] (ROUND[x] denotes rounding x to the nearest integer) of the form (the motivation for the introduction of f=(1−g) will be made clear below)(2.1)Obviously, this requires that f((Ak)ij)∈[0,1]. We seek also to impose the condition that the f((Ak)ij) satisfy(2.2)i.e. the predicted infection status of j at time (k+1) depends only on the predicted infection statuses of those individuals q neighbouring j at the previous time k. This condition amounts to demanding that the quantities f((Ak)ij) form a localized dynamical system with discrete time index k.

Note that the interval [0,1] is closed with regard to binary multiplication (i.e. when both the numbers to be multiplied are taken from [0,1]) and to being raised to any positive power (but not with regard to addition). Thus, an obvious manner in which to simultaneously satisfy the saturation condition f((Ak)ij)∈[0,1] and the localized dynamics condition described by equation (2.2) would be to impose the stronger condition(2.3)where c(pI,pS) is a constant (i.e. independent of (Ak)ij) incorporating information regarding the transmission rate of the infection. Note that the presence of uninfected neighbours at time k does not influence the infection status of node j at time (k+1) in the dynamics resulting from equation (2.3), since a definitely uninfected node q must have f((Ak)iq)=1. This is the motivation for using f as defined in equation (2.1) instead of g, since the latter unduly complicates the expression of equation (2.3).

Is there a simple choice for a function f satisfying equation (2.3)? In order to answer this question, consider the following equivalent form of equation (2.3):(2.4)That is, in algebraic terminology, f must be a homomorphism from the additive real numbers to the (positive) multiplicative real numbers. This essentially requires that f be an exponential function,(2.5)

In this manner, we are led to consider what we refer to as the ‘wellness metric’,(2.6)where c(pI, pS) has been specified as (pI/pS). This is the simplest possible form for c satisfying the requirement that for a large (both in the number of nodes and average number of connections 〈d〉 per node) random graph, the wellness metric should be able to accurately predict the epidemic threshold condition (in this case, corresponding to the more commonly written R0>1)(2.7)That equation (2.6) does indeed reproduce this result for large random graphs can be demonstrated by considering the fact that the dominant eigenvalue of the adjacency matrix A of such a network must be approximately 〈d〉 (Farkas 2001). Since A is a symmetric matrix, it can be diagonalized by an orthogonal matrix O, A=OTDO, where D is the diagonal matrix of eigenvalues of A. Then Ak=OTDkO, so that , where λm=Dmm is the mth eigenvalue of A. For large k then, (Ak)ij=〈d〉k×[O1iO1j+ terms of order (λ2/〈d〉)k], where λ2 is the second largest magnitude eigenvalue of A, which is generally significantly smaller than 〈d〉 for a large random graph. Approximating Ak=〈d〉kO1iO1j, it is then apparent that, if (pI〈d〉/pS)<1, wij(k) from equation (2.6) will converge to 1 as k grows, while if instead (pI〈d〉/pS)>1, wij(k) will decrease to 0.

Defining the ‘unwellness’ uij(k) by (so that uij(k)=g((Ak)ij) as defined in equation (2.1))(2.8)the sum Σj[uij(k)] is then taken as a crude indicator of the expected total level of infection after Tk=ROUND[k*pS−1] time units resulting from an initial infection of individual i. Thus, the quantity(2.9)whereis here interpreted as affording a prediction of the expected total level of infection after Tk time units as a result of the initial infection of a single individual when that individual's identity is undetermined.

2.3 Unwellness connectivity variance versus node variance

In equation (2.9), bracket notation was introduced for the average of the quantities uij(k) over the indices i and j. That is, 〈u(k)〉 was introduced as an average over the degree to which all of the nodes are ‘connected’ to each other by k-walks filtered through the (pI/pS)-dependent exponential saturation function defining the unwellness metric. In this sense, one could think of 〈u(k)〉 as the average value of the ‘unwellness-connectivities’, or for short, u-connectivities uij(k).

One could think of 〈u(k)〉 equally accurately, however, as the average(2.10)whereIn this interpretation, 〈u(k)〉 represents the average over all nodes i of what we will term the ‘unwellness node value’ (or u-node value) ui(k). This u-node value then represents the average degree to which a particular node i is connected to the other nodes of the network by k-walks (again filtered by the saturation function of the unwellness metric). Heuristically speaking, the u-node value ui(k) may be thought of as a measure of the average unwellness of any other node in the network as a result of the initial infection of the single node i.

The reason for introducing this alternative interpretation of 〈u(k)〉 becomes clear if one focuses on the second moments, instead of the mean or first moment of the distribution of the values uij(k). That is, the u-connectivity second moment,(2.11)is a very different quantity than the u-node second moment,(2.12)where the u-connectivity second moment 〈u(k)2〉 relates to the variation in walk connectivity over all node pairs (i, j) and the u-node second moment {u(k)2} relates instead to the variation over single nodes i of overall connectivity to the remainder of the network.

The manner in which the different types of network structures vary with respect to these measures is discussed further in §2.6.

2.4 Modified unwellness metric

In order to construct an improved epidemic metric, it should be noted that the dynamical system described by equation (2.6) (‘dynamic’ in the light of its equivalence to equation (2.3), discussed in §2.2) does differ qualitatively from the usual epidemic models in one very important way: the ‘wellness’ of individual j at time (k+1) may be arbitrarily close to 0 even if only one of its neighbours at time k has wellness differing from 1. That is, the interpretation of wij(k+1) as the probability that individual j is not infected at time (k+1) suffers from the unrealistic feature that this probability may be arbitrarily small, given even just one neighbour q with a sufficiently small value of wiq(k).

We therefore seek to modify equation (2.9) for the predicted level of infection based on the unwellness metric to address this problem. Following the discussion above, it seems reasonable to suspect that the u-connectivity second moment 〈u(k)2〉 (equation (2.11)) might be a particularly relevant quantity to consider. Since uij(k)≤1, it follows that 〈u(k)2〉 must be less than or equal to 〈u(k)〉, though typically 〈u(k)2〉 is much less than 〈u(k)〉 unless a substantial number of the quantities uij(k) are sufficiently close to unity (requiring wij(k)∼0). Note that this is precisely the condition in which the unwellness metric is expected to overpredict the transmission.

We also incorporate the u-node second moment {u(k)2} (equation (2.12)) related to the variance in the network unwellness resulting from different choices of the initial infected node. Particularly, in the case where these sorts of variations in the unwellness quantities uij(k) are of similar or greater magnitude to their average value (e.g. in the scale-free networks with large degree variance), it seems highly relevant to consider this quantity.

We find that both of these quantities may be profitably incorporated into the modified unwellness metric,(2.13)The parameters α, β and γ appearing in equation (2.13) were fit by linear regression, as described in §2.8. The specific form of equation (2.13) (particularly, the choice of taking the square root of {u(k)2} but not 〈u(k)2〉) was chosen based on the resulting quality of the regression fit.

2.5 Application of unwellness metric to disease persistence

In §2.2–2.4, we develop the unwellness metric as a predictor of the dynamic progression of infection on networks for those cases in which the disease does not go extinct within a few generations of introduction. It is perhaps even more important, however, to gain insights into how the network structure influences the likelihood of such early extinction as opposed to persistence. This motivates us to apply a similar regression strategy to that used in obtaining the ‘modified unwellness metric’ to assess the degree to and manner in which the unwellness metric may be used as a predictor of the persistence of infection.

Consider first the logistic model(2.14)where Ppersistence is the probability that an infection initiated at a single randomly chosen node will persist for several (here, 25) generation times and kearly is a fixed (low) value of the generation time index k. From the results of the regression analysis performed (as described in §2.8) on simulated SIS networks, the optimal value of kearly appears to be 2; this value is assumed below. This regression analysis also indicates that the u-connectivity second moment term 〈u(kearly)2〉/〈u(kearly)〉 is not a significant predictor of persistence rates. Thus, we consider instead the reduced logistic model(2.15)

2.6 Network construction

Four different classes of networks are considered in this paper, as follows.

ERrandom graphs with n nodes of average degree d were constructed by including each edge i↔j with (independent) probability (d/(n−1)).

SWnetworks with n nodes of (even) average degree d with rewiring probability p. These networks were constructed as in Watts & Strogatz (1998): after arranging the n nodes in a ring and connecting each node to the (d/2) nearest nodes on either side of it, each edge is randomly rewired with probability p to have one of its ends connected to a new node chosen with uniform probability from the set of all nodes in the system, which are not already neighbouring the edge's fixed end.

Barabasi–Albertscale-free networks with n nodes of average degree d. These networks were constructed as in Barabasi & Albert (1999). Starting with (d+1) fully connected nodes, new nodes are sequentially added to the network, with each new node bringing with it (d/2) new edges. Each of these (d/2) new edges has one terminus fixed at the new node j, with the other terminus selected to be previously existing node i (assuming i is not already a neighbour of j) with relative probability di(j), where di(j) is the degree of node i with respect to the network made up of the first (j−1) nodes only. The network is completed after the addition of the nth node (and its corresponding edges).

Adjustable networks with n nodes of average degree d, described by several additional parameters (table 1) controlling the degree distribution, the community structure/clustering and the assortativity of the networks. The parameter C controls the number of distinct communities into which the network will be divided, with the level of interconnection between distinct communities increasing with the parameter Ncluster-swap (see the electronic supplementary material). In conjunction with these clustering parameters, the parameter dsf adjusts the variance of the degree distribution (increasing dsf increases this variance). Finally, the sign of the parameter psort controls whether the network will be assortative (psort>0) or dissortative (psort<0), with the magnitude of psort adjusting the magnitude of the degree–degree correlations. The parameters Nsort–swap and Ncluster–sort–swap govern the number of iterations used in the assortativity adjustment algorithm (see the electronic supplementary material); as long as these are set to sufficiently high values, the desired psort-dependent assortativity patterns should be generated. The details of the algorithm used to construct these adjustable networks are described in §3 in the electronic supplementary material. Figure 1 shows how the standard deviation of the degree distribution, the transitivity and the assortativity (as measured by the Pearson correlation coefficient of the degree of the node at one end of an edge with that of the degree of the node at the other) vary with regard to the parameters in table 1 for the adjustable networks described here. (The values shown in figure 1 are averaged over 10 networks generated at each distinct combination of control parameters.) It is immediately apparent that the various parameters used to adjust the network structure do not work entirely independently of one another (as judged by these structural metrics). For instance, the variance of the degree distribution depends on both dsf and on the number of communities the network is divided into, while the network transitivity depends very strongly on the assortativity rewiring parameter psort as well as the number of communities.

Values and descriptions of the control parameters used in generating the structures of adjustable networks. (As defined in §2.6. Also, the SIS simulations (see §2.7) for regression fits discussed in §2.8 were done on each of 10 randomly generated networks from each of the (4*3*3)=36 combinations of the values used cited here.)

Properties ((a) the standard deviation of the node degree, (b) the network transitivity (clustering) and (c) the correlation of degree of neighbouring nodes (assortativity)) of ‘adjustable networks’ (§2.6) for various values of control parameters (table 1). The parameter dsf increases through the sequence 0, 2, 6, 10 from front to back in each plot. The value of psort increases moving from left to right within each group of three similarly coloured bars through the sequence (−1, 0, 1). Moving across the three groups of bars (again from left to right): grey bars indicate no community structure; blue bars indicate moderate clustering into communities; and green bars indicate severe clustering into communities. All networks considered here are composed of 1000 nodes of average degree 10.

2.7 SIS simulations

The parameters defining the discrete-time network SIS models simulated to generate the data appearing in figures 2–4 are presented in table 2. For all of these simulations, the state of the dynamic system may be described as a network of labelled nodes. The network structure itself is here always taken to be constant (i.e. the network edges are fixed in time), but the ‘labels’ associated with the nodes—in this case, susceptible (S) or infected (I)—are functions of the discrete time index. The stochastic dynamics of the infection status of the nodes may then be described by giving the (independent) probabilities per time step of the various types of state-variable transitions available. There are only two basic types of transitions for the network SIS systems simulated herein.

S→I transitions. If node j is in the state S and has exactly m infected neighbours at time step t, node j will become infected at time step (t+1) with probability (1−(1−pI)m). Otherwise, node j will remain in state S.

Results of the regression fit (±s.e.) for the modified unwellness metric. (See §2.4.)

I→S transitions. If node j is in the state I at time t, it will transition to state S at time (t+1) with probability pS. Otherwise, node j will remain in the state I.

It should be emphasized that the discrete time step used for these simulations is not the same as the ‘generation time’ (which is equal to pS−1 times the time step employed within these simulations) discussed in the context of the wellness metrics.

In order to fit the parameters α, β and γ of the modified unwellness metric (see §2.4), 360 networks were constructed according to the adjustable network method described in §2.6 with systematically varied values of the network structure control parameters (table 1; figure 1). For each of the networks thus constructed, 50 SIS simulations were performed as described in §2.7. Those trajectories resulting from these stochastic simulations that did not result in rapid extinction (i.e. within 1000 time steps, or 25 generation times) of the infection were then used to estimate the parameters α, β and γ of the modified wellness metric. This was done by linear regression using the modelwith random error term ϵ and with denoting the number of infected nodes at time step (k/pS) averaged over those trajectories with persistent infection. The results of this fit are shown in table 2.

The parameters κ, λ and ν of the logistic model for the probability of disease persistence were estimated similarly from the same simulation data. The results of this fit are indicated in table 3.

Results of the regression fit (±s.e.) for logistic model for persistence probability based on the unwellness metric. (See §2.5.)

3. Results

Note that in all of figures 2–4, the simulation data presented resulted from running 250 trajectories on a single representative network of each type indicated and averaging the infection levels at each time step only over those trajectories exhibiting disease persistence through the entire 1000 time unit window run. The networks and simulations used to generate figures 2–4 were not used in the regression fits described in §§2.4 and 2.5.

3.1 Unmodified unwellness metric results

Figure 2a–c compares the results of SIS simulations (described in more detail in §2.7) on a variety of networks with the quantities I0(k) (§2.2). These results indicate that while the unwellness metric given by equation (2.9) does capture some of the qualitative variation in epidemic progression on distinct network structures, structural features beyond 〈u(k)〉 must be important to take into account as well.

The quantities I0(k)=n〈u(k)〉 track the average results of the SIS simulations well for some of the network types at early times, but overshoot the SIS simulations at later times. Particularly, with regard to the fact that the unwellness metric results asymptotically saturate at uij=1 for all i and j for those (connected) networks pictured, figure 2a–c demonstrates the consequences of the difference in the dynamics of the unwellness metric from the SIS simulations described in §2.7. The best match between the unwellness metric results and the SIS simulations is on the relatively featureless random graphs. The unwellness metric tends to show less difference in the epidemic spreading rate (in either direction) between the random graphs and other types of network structure than do the SIS simulations (figure 2a). This discrepancy is especially pronounced for the SW networks: while clustering appears to have the same qualitative effect of slowing down spreading in both the unwellness metric and the SIS simulations, the degree of slowdown in the simulations is considerably greater than it is for the unwellness metric.

3.2 Modified unwellness metric results

Figure 2d–f shows the modified unwellness metric I1(k) (§2.4) fits rather well for a variety of networks, particularly the Erdos random graph, regular lattice, scale free and scale-free assortative networks. On the SW network featuring both a high degree of clustering and the ‘SW’ property of low characteristic path length, the modified wellness metric predicts the SIS simulation data very well over the first four to five generation times, but overpredicts the epidemic spread beyond this point. Finally, on the dissortative scale-free network, the prediction develops an interesting (but pathological) oscillatory behaviour starting at the fifth generation.

Figure 3 compares the results of the modified unwellness metric model of equation (2.13) with networks as the density of network connections is varied (in these cases, with the disease transmissibility parameter varying inversely so as to hold the expected number of secondary infections resulting from a randomly selected initially infected node constant). Two features of the results shown in figure 3 merit special attention. First, it is apparent that the model predictions generally fare better in comparison with the simulation data in denser networks than sparser networks. This trend is most noticeable for the highly clustered SW and lattice networks. Second, the predicted values for the first-generation time index k=1 overshoot the simulated values on the high degree-variance scale-free network topology at higher connectivity densities. Interestingly, as the generation time index k increases past 1, this behaviour rapidly vanishes.

Robustness of the fit parameters α, β and γ with respect to the ratio (which becomes equal to the epidemic threshold R0 in the limit of large random graphs with high average connectivity)was also studied through variation of the disease transmissibility pI (table 2; figure 4). For the values of transmissibility considered, all three model parameters were significant, with β consistently negative and γ consistently positive. The parameter γ associated with the u-node second moment term appears to be more important for lower values of the transmissibility. This may reflect the larger importance of the bias of persistent disease trajectories towards those initiated from more well-connected nodes when disease extinction is a greater possibility (i.e. when transmissibility is lower). The quantitative value of the fit parameter β is considerably more robust to the disease transmissibility (table 2).

Finally, it is interesting to note that the results for the modified unwellness metric predictions with increased transmissibility show a qualitatively similar, albeit smaller, overshoot for the scale-free networks at the first-generation time index k=1 as appeared in the increased edge density situation.

3.3 Results of unwellness-based logistic model for persistence

The results of the regression fit of the model specified by equation (2.15) in §2.5 are shown in table 3. Table 4 displays the resulting predictions contrasted with the simulated data for various network types (again, these simulations were not used in the regression analysis determining the values of κ, λ and ν). The assortativity/dissortativity of the network structure exhibits a very large impact on the probability of persistence, which the model does a good job of capturing, largely through the u-node second moment term. Smaller effects seem to be associated with the clustering and the degree distribution.

Comparison of the SIS simulation data (±s.e. of estimate) and the unwellness metric-based logistic model predictions for disease persistence probability. (See §2.5. Note that the base case networks have average degree 10, sparse networks have average degree 4 and dense networks have average degree 20.)

Especially in combination with high clustering, reduced network density (again varying the infection transmissibility inversely to hold the product of connectivity and transmissibility constant) can lead to sharply decreased persistence probability. This is demonstrated by the simulation results in table 4. Also apparent in this table is the breakdown of the logistic model of equation (2.15) when applied to such sufficiently sparse (e.g. average degree 4 in the data presented in table 4) and clustered networks. Conversely, at higher network densities (e.g. networks of average degree 20 in table 4), the logistic model for persistence probability appears to perform much more reliably.

The parameter λ associated with the 〈u(kearly)〉 (recall from §2.5 that kearly=2 here) term is found to be positive, suggesting that the networks for which the unwellness metric predicts faster epidemic spread tend to be more likely to sustain disease persistence. Interestingly, from the robustness with respect to transmissibility (and hence, with respect to the ratio (pI〈d〉/pS)) results displayed in table 5, it is apparent that the impact of this unmodified unwellness metric term is much larger for relatively lower transmissibility.

Comparison of the SIS simulation data (±s.e. of estimate) and the unwellness metric-based logistic model predictions for disease persistence probability. (See §2.5. Note that the base case networks have transmissibility pI=0.05; all the networks indicated have average degree 10. The parameters for logistic model fits for each transmissibility case are indicated in table 3.)

The parameter ν associated with the u-node second moment term {u(kearly)2}1/2/〈u(kearly)〉 is found to be negative (and fairly robust to variation in the disease transmissibility). While considered in isolation, this might be taken to suggest that the networks in which there is especially large variation in the secondary infections produced as a result of infecting different initial nodes tend towards decreased propensity for disease persistence, examination of the results of simulations on (non-assortative) scale-free networks shows that this is not the whole story. In this case, in particular, the effects of the relatively large u-node second moment term are offset by a coinciding increase in the u-connectivity first moment term 〈u(kearly)〉.

However, shifting attention to the assortative scale-free network simulation results, it becomes apparent that the combination of highly variable degree distribution and assortative degree correlations can result in even larger {u(kearly)2}1/2/〈u(kearly)〉 values without proportionally increased 〈u(kearly)〉 values, and in this case there is indeed a dramatically lowered likelihood of disease persistence.

4. Discussion and conclusions

The ‘unwellness metric’, incorporating u-connectivity, the u-connectivity second moment and the u-node second moment, is introduced in this paper as a way of directly relating network structure data to epidemic spreading processes. This metric may be calculated in an entirely straightforward manner from the adjacency matrix of a network and herein is shown to contain a great deal of information regarding the rate of spreading of infection, as well as the likelihood of persistence of infection—at least, for networks with a sufficiently high edge density (approx. 10 edges per node).

It is worthwhile to pause for a moment to consider the relationship of the unwellness metric to a few more familiar measures of network structure. The networks with more variability in the distribution of node degrees (e.g. the scale-free networks) tend to lead to larger average values of the u-connectivity 〈u(k)〉 (figure 2a), while more highly clustered networks tend towards smaller average u-connectivity. These particular trends in u-connectivity appear to qualitatively reflect the different speeds with which the infections spread on such networks (figure 2a).

Somewhat surprisingly, the scale-free networks with either assortative or dissortative degree correlations were here found to exhibit lower u-connectivity values than those with no such correlations (figure 2b–c). While this phenomenon does seem to qualitatively match with the infection simulation results at times following the initial epidemic explosion, it does not capture the greater speed of outbreaks during the early transient phase in assortative networks (figure 2c). For this case, it appears that the increased variation, not just the decreased mean, of the distribution of u-connectivities uij(k) must be taken into account in understanding the transmission dynamics.

Two distinct measures of such variation were proposed in §2.4, the u-connectivity second moment 〈u(k)2〉 and the u-node second moment {u(k)2}. The u-connectivity second moment is associated with the variation in the strength of coupling of nodes taken in pairs, while the u-node second moment corresponds with the variation in connectivity of individual nodes to their surroundings. Large variation in the degree distribution tends to lead to relatively large values of the u-connectivity second moment compared with random graph structure, but at early generation times k, the network clustering leads to an even larger increase in this quantity (figure 5a,b). On the other hand, while, as one might intuit, more heterogeneous degree distributions lead to much larger values of the u-node second moment (figure 5c,d; again, for low k), clustering tends to depress this particular measure. Assortative degree correlations tend to further increase both the measures of variation for low k (figure 5). Dissortativity tends to lead to larger values of the u-connectivity second moment at low k, but exhibits a somewhat more complex pattern for the u-node second moment (figure 5).

The unwellness metric, modified to incorporate the variation measures as discussed, was used to predict both the probability that a single initial infection would lead to a persistent endemic infection (tables 4 and 5) and the transient time course of initial epidemic outbreak in persistent trajectories (figures 2–4). Both infection persistence and speed of outbreak tend to increase with the average u-connectivity. The speed of outbreak tends to be reduced for networks with larger u-connectivity second moments, but these quantities exhibited no significant effect on the persistence probability. Finally, networks for which the u-node second moment was particularly pronounced tended to show increased outbreak speed but decreased likelihood of persistence.

The results presented herein pertain to the SIS-type epidemic spreading processes for which recovery from infection provides negligible immunity to re-infection. The extension of the unwellness metric approach to the SIRS-type epidemic models poses some interesting challenges. The simplest option for such extension would be to apply the regression strategy employed in the modified unwellness metric discussed here to a set of SIRS models. In this manner, one might get a sense of how the structural features introduced here (the average u-connectivity, the u-connectivity second moment and the u-node second moment) influence the dynamics of infection in such models. One complication with this approach is that it would require multiple regressions to be done for situations involving different characteristic time scales for the loss of immunity (i.e. R→S).

A more complex approach to extending the unwellness metric to SIRS processes would be to try and replace the basic definition of wij(k) (equation (2.6)) with a quantity naturally incorporating the R→S rate parameter. For instance, one might try and modify wij(k) by subtracting out a quantity representing a measure of the number of immune individuals who might have, in the absence of immunity, been expected to have become infected in the interval between time index (k−1) and time index (k). Such a modification of wij(k) would have to be done with care in order to preserve the various properties discussed in §§2.1 and 2.2 (especially boundedness, i.e. 0<wij(k)<1).

It is hoped that the results of this study will prove useful in translating the data on social network structures into practicable predictions regarding the effectiveness of proposed intervention strategies such as vaccination, isolation and quarantine strategies that alter the network structures in ways that reduce transmission rates, possibly leading to the eradication of particular infectious diseases. In particular, it is worth emphasizing that the unwellness metric may be easily applied to networks with arbitrary clustering and degree correlation patterns. It also bears mentioning that this is not only a useful property for application to empirical network data, but also for application with regard to the network structures produced by dynamical network models which may not lend themselves to simple analytical solution.

Acknowledgments

This work was supported by a James S. McDonnell Foundation 21st Century Science Innovation Award and NIH grant no. GM83863 to W.M.G.