Abstract

Network theory is finding applications in the life and social sciences for ecology, epidemiology, finance and social–ecological systems. While there are methods to generate specific types of networks, the broad literature is focused on generating unweighted networks. In this paper, we present a framework for generating weighted networks that satisfy user-defined criteria. Each criterion hierarchically defines a feature of the network and, in doing so, complements existing algorithms in the literature. We use a general example of ecological species dispersal to illustrate the method and provide open-source code for academic purposes.

1. Introduction

Network theory is gaining traction in the life and social sciences, finding applications in the fields of ecology, economics, epidemiology, finance and more recently, in analysing social–ecological systems [1–4]. In ecology, there is growing interest in the role of connectivity between spatial locations on the stability and resilience of those systems [5,6]. Numerous other applications exist for the study of infectious disease spread [7,8], optimal management of spatially distributed species [9], neural communication [10], bank failure [11] and the diffusion of ideas [12]. While there are methods to generate specific types of networks (table 1), these are primarily limited to unweighted networks [1]. In this paper, we present a general framework for generating suites of weighted networks that meet user-defined criteria, e.g. symmetry, clustering or other metrics measuring network connectivity [3,23]. The framework makes use of a hierarchy of metrics, where each metric specifies a distinct feature that describes the structural properties of the network; the hierarchy gives a natural order in which to consider the addition of new metrics. Our method uses optimization theory to create adjacency matrices that define the relationship between different entities such as spatial locations or the interactions between species, people and institutions. Our work is most akin to the network generation algorithms of Carayol et al. [15] and Small et al. [16], who use genetic algorithms and pay-off functions to optimize the generation process. However, we specifically generate weighted networks and optimize a suite of network metrics rather than node degree. We demonstrate our method using a general example of ecological species dispersal and provide open-source code of our algorithm for academic purposes.1

2. Terminology

For the purposes of illustrating the algorithm, we consider a ‘habitat graph’ network composed of spatially distinct habitats or patches that are connected by species dispersing among them. However, our method is not limited to spatial networks and can be used to generate weighted networks in other applied fields such as those of species interaction or social networks. See [1,3] for reviews of different types of networks across disciplines.

In our study, each patch is referred to as a node; each connection is referred to as an edge, which allows for the flow of species between patches. An adjacency matrix describes the connections between patches.

Let us construct an adjacency matrix that represents the connectivity between n spatial locations or patches. Patches are connected by the movement of species, people or any flow between spatial locations. For illustrative purposes, we consider the case of a single species dispersing between patches.

Let the entries of the adjacency matrix correspond to the cost of movement between patches where the dispersal of species is symmetric, i.e. species move bidirectionally at the same rate. We assume a weighted network, where the off-diagonal entries of the adjacency matrix are constrained to be real numbers greater than or equal to zero. Increasing the cost of movement between patches to infinity is equivalent to assuming that two patches are not connected.2

Mathematically we define a randomly generated adjacency matrix describing the cost of species movement between patches as:
A=[a11⋯a1n⋮⋱⋮an1⋯ann],2.1
where A is a square n × n matrix and aij is the degree of connectivity between patches i and j. Restrict aij > 0 for all i ≠ j. We assume that a patch cannot be connected to itself, i.e. A is a zero-diagonal matrix.3

We constrain the adjacency matrix to possess a spectral radius of r, along with a specific variance (v) and skewness (s) of the associated dominant eigenvector xr⇀. We focus on these three for the following reasons. First, because we are primarily interested in weighted networks, metrics concerning node degree distributions (e.g. node or link density) are less appropriate. With weighted networks, our focus is the node connectivity distribution. Second, our metrics are particularly useful for illustrating a hierarchical approach to network construction and identification. Combining multiple network metrics—starting with general summary measures and then moving to more specific metrics (the hierarchy of ordering)—gives a more holistic understanding of network structure. For instance, spectral radius is a good proxy of connectivity [26] but there exist an infinite number of potential networks for a given spectral radius. Combining spectral radius with the dominant eigenvector provides a more complete picture of network structure. For any given spectral radius there exists a dominant eigenvector that can be described as a distribution of node connectivity scores. The statistical moments of such a distribution are a natural (hierarchical) method of discerning differences between networks. In other words, any two networks can be differentiated as higher statistical moments are calculated.

The spectral radius is the largest (dominant) eigenvalue of the adjacency matrix.4 In network terms, it represents the average distance it takes to traverse across the entire landscape and is therefore a proxy of network connectivity [26]. A low (high) spectral radius indicates a network that is highly (poorly) connected. We formally define the spectral radius as:
r=max{λ1,λ2,λ3,…,λn},2.2
where {λk}k=1…n are the eigenvalues associated with A. Let xr⇀ represent the eigenvector associated with the dominant eigenvalue.

Spectral radius alone is insufficient to guarantee a particular network structure. For a specific spectral radius there exists an infinite number of potential network configurations (figure 1). This makes it difficult to confidently generalize conclusions drawn from network process outcomes. Therefore, we further constrain the adjacency matrix (of spectral radius, r) to possess a defined variance and skewness in the associated dominant eigenvector, xr⇀.

Comparison of networks with different properties. Note that each network has six nodes. (a,b) Networks have the same spectral radius (r = 80 km) but different variances (0 and 0.026). In a, all nodes contribute evenly to the overall connectivity of the network; in b, node contribution is not homogeneous. Nodes 1 and 4 are more connected than the others. (c,d) Networks have the same spectral radius and variance (r = 65 km, v = 0.0086) but different skewness (−1.79 and 1.086). In c, only node 3 is a strong contributor to connectivity; in d, nodes 1–4 contribute strongly. Adapted from Salau et al. [28].

In a network context, the dominant eigenvector represents a collection of node connectivity scores or ‘eigenvector centralities’ of the network [29]. Each component in the dominant eigenvector measures the relative contribution of each node to the overall connectivity of the network. Each network consists of nodes with varying degrees of contribution. Some nodes may be high contributors to connectivity while other nodes may be isolated and contribute less. We define highly (weakly) connected nodes within a given network as nodes with a contribution score greater (less) than the mean contribution; where, the mean contribution score is the average of the dominant eigenvector components.

Variance of the dominant eigenvector measures the spread in node contribution across the network and thus quantifies node heterogeneity for a given network. In statistical terms, variance is the mean squared deviation of node contribution to connectivity. A low variance indicates that each node contributes similarly to network connectivity; a high variance is indicative of a wide distribution in node contribution.

Skewness of the dominant eigenvector measures the net proportion of highly to weakly connected nodes in a network. A skewness of zero implies an even proportion of highly and weakly connected nodes. A positive (negative) skewness indicates a greater (lower) number of highly connected to weakly connected nodes. Taken together, spectral radius, variance and skewness of a network highlight unique topological properties of the desired network (figure 1).

Variance (v) and skewness (s) of xr⇀ take the forms:
v=1n∑k=1n(xr,k−⟨xr⇀⟩)22.3
and
s=(1/n)∑k=1n(xr,k)3−3⟨xr⇀⟩v−⟨xr⇀⟩3v3/2,2.4
where xr,k denotes the kth element of xr⇀ and ⟨xr⇀⟩ represents the mean of the elements of xr⇀.

Alternatively, one may use other sets of network metrics to constrain the desired properties of the network. Many established network metrics are highly correlated but a clear description of how they hierarchically build upon each other is lacking in the literature [3,30]. By using the moments approach to summarize a network's node connectivity scores, each of our metrics highlights a distinct feature of networks and make hierarchy a non-issue [28].

3. The algorithm

In this section, we outline an algorithm to generate networks that conform to specific values for all three metrics [28]. Let r*, v* and s* serve as user-defined criteria that indicate the desired spectral radius and variance and skewness of the associated dominant eigenvector.5

We generate a random adjacency matrix, A0, to serve as initial conditions. Denote the spectral radius and variance/skewness of the associated eigenvector of A0 as r0, v0 and s0, respectively. In our algorithm, A0 and its associated network metrics are updated according to a numerical hill-climb method.

We choose the entries of A0 such that they minimize the weighted sum of squared differences between the properties of our ‘new’ adjacency matrix and our user-defined criteria.6 This results in the following minimization problem:
minaij[ω1(r∗−r0)2+ω2(v∗−v0)2+ω3(s∗−s0)2].3.1

The values of ωk are the weights associated with the preferred degree of accuracy of the estimated entry to the predefined user criteria. The larger the weight, the greater the variable contributes to the sum of squares and the closer the estimated value must be to the desired theoretical value. The values of r0, v0 and s0 are determined directly from the entries of A0 and are, therefore, functions of aij.

The minimization problem in (3.1) is constrained by (subject to) a set of equations defining the structural properties of the desired adjacency matrix. The constraints form an n2 × n2 system of equations such that:
[B][a11⋮ann]=[E].3.2

Collectively, the above equation is referred to as the equality constraints. On the left-hand side, the matrix B refers to the boundary equality conditions matrix. The matrix B is an n2 × n2 matrix that captures the characteristics of the matrix A0. Each row corresponds to an equation describing the connectivity between patches. Each column corresponds to a variable representing an entry in A0. The matrix B is multiplied by an n2 × 1 vector of variables, one for each entry of A0. The solutions are given by an n2 × 1 vector E, often referred to as the equality conditions. Together (3.2) forms a system of at most n2 equations and n2 unknowns that capture the desired structural properties of the adjacency matrix. An illustration of the construction of (3.2) for a three-patch system is found in the electronic supplementary material, A. We present the general method below.

Initially define B and E to be a zero matrix and vector, respectively. For the entries of B corresponding to the diagonal and zero off-diagonals of A0, the following must hold:

— A patch cannot be connected to itself. For each diagonal entry of A0 there exists a row in B such that the corresponding entry of aii in B is equal to one with all other entries in the row equal to zero. (After matrix multiplication in (3.2) this results in aii = 0).

— To specify that two patches are not connected in the adjacency matrix, set the entries of B corresponding to the respective off-diagonal entries of A0 equal to zero. In other words, if aij = aji = 0 in A0, then there exist two rows in B such that the corresponding entries for aij and aji in B should be one while all other entries in both rows are equal to zero.

For the entries of B corresponding to the non-zero off-diagonals of A0, the following must hold:

— For each pair of connected patches in A0, there exists a row in B such that the corresponding entries for aij and aji are equal to 1 and −1, respectively. All other entries in the row are 0. This implies symmetry. (After matrix multiplication we obtain aij = aji.)

The minimized values of A0 are fed into a generic hill-climbing numerical algorithm to solve for the optimal solution. The pseudo-code for the method is outlined below:

— Define a new adjacency matrix consisting of the minimized values of A0 as A0∼. Let the values of the network metrics associated with A0∼ be given by r0∼, v0∼ and s0∼.

— Compare the values of each network metric derived from A0∼ to r*, v* and s*. One method to do so is to calculate the absolute value of the difference between the desired and derived metrics (e.g. |r∗−r0∼|, |v∗−v0∼|, |s∗−s0∼|). Another is to calculate the per cent difference between the desired and derived metrics.

— If all three metric differences fall below a pre-determined tolerance level, the algorithm concludes.

— However, if the difference between any of the metrics exceeds a pre-determined tolerance level:

(i) Randomize one off-diagonal element of A0∼ subject to the assumptions discussed above.

(ii) Define the perturbed A0∼ as new initial conditions for the minimization problem in (3.2). In other words, update A0, r0, v0 and s0 to be the perturbed adjacency matrix A0∼. Minimize the sum of squared differences.

(iii) Update A0∼.

The resulting output is an n × n adjacency matrix that is symmetric and possesses a spectral radius, variance and skewness of its associated eigenvector of r*, v* and s*. Figure 2 illustrates available network configurations generated using our method. Other configurations are found in electronic supplementary material, A, including network configurations of 50, 100 and 250 patches. While small compared with neural networks or the World Wide Web, these network sizes are suitable for many empirical social and ecological networks [4,31–33].

Available network configurations that can be generated using the algorithm. Each network consists of 10 patches. (a) Each combination of variance (v*) and skewness (s*) of the dominant eigenvector assumes a desired spectral radius (λ*) of 20. The minimum (wmin) and maximum (wmax) distance between nodes are set to 1 and 50, respectively. A red dot indicates convergence; a black dot indicates that the algorithm did not converge to a solution. Other potential configurations can be found in electronic supplementary material, A. (b–d) Visual representation of the networks indicated by circles in a, where: v* = 0.005, s* = 0.4 (b); v* = 0.005, s* = −0.5 (c); and v* = 0.025, s* = −0.5 (d).

Convergence of the algorithm is dependent on several factors. First, for a given spectral radius and number of patches, there exists a lower and upper bound on the feasible values of the variance and skewness of the dominant eigenvector. These are derived in electronic supplementary material, A. Second, while minimizing the weighted sum of squared differences guarantees a concave function, the value of the desired properties of the adjacency matrix (and their associated weights) in (3.1) determine the specific shape of the function. The flatter the function to be minimized, the more difficult it is for the algorithm to converge to a solution. Finally, we cannot rule out the possibility of convergence to a local minima, in contrast with a global minima. However, this is a general problem characteristic of optimization theory [34,35], and formal analysis is left for future work.

Our algorithm was coded in Matlab 2016a. Source code is available in the electronic supplemental material and is free for academic use.7

4. Discussion

By varying initial conditions one may generate an infinite number of weighted networks with specific structural properties. Defining multiple criteria in the generation process allows one to fine-tune the structure of the resulting network, and explicitly test the effect of network structure on system properties such as stability or resilience.

Further, our method is not limited to spectral radius and eigenvector centrality, but may be applied using any network metric deemed appropriate to describe the connectivity of the system. For example, it is possible to use clustering and other metrics that are employed to analyse social and biological networks (i.e. clustering, degree centrality and betweenness) [1,2,22]. It is feasible to use our algorithm to generate networks with combinations of distances between nodes, node contributions to connectivity and/or clustering.

Nor is our method limited to symmetric, bidirectional dispersal. By changing the system of equations in (3.2), one may allow for unidirectional or asymmetric dispersal between patches. The method can also be extended to multiple species and unweighted networks. For the former, each species has its own adjacency matrix that describes the flow of that species within the network. The latter is intrinsically more difficult. Intuitively one would constrain the entries of the adjacency matrix to zero or one, and (if desired) solve for an adjacency matrix with a particular average, variance and skewness in the number of connections per node. In this case, the minimization problem in (3.1) becomes a nonlinear binary integer problem, which is difficult to solve and is left for future work.8

Like other network generation methods (table 1), there is the possibility that the networks we generate reflect only a small subset of the entire suite of networks that meet our specified criteria. While we know of no way to explicitly test this, we can combine our method for generating weighted networks with those of unweighted networks to more fully ‘paint the picture’ of a network's structure. We may, for example, create a weighted network that has a particular node degree distribution. To do so, one would first generate an unweighted network describing the structural connections between nodes (e.g Erdõs-Réyni [17], scale-free [20], small world [22]). This would have a particular node degree distribution. The binary connected/not connected coefficients could then be fed into the equality constraints of (3.2) to create a weighted network with a specific degree distribution that also meets the desired properties of the weights.

Our method has applications across disciplines. We briefly discuss several examples. In epidemiology, disease propagation is often faster in structured population networks than random ones [20]. More specifically, the structure of the network of contacts has implications for the management of human and animal disease spread [8,36]. In ecology, the dispersal of species plays a distinct role in the persistence and survival of species. Examples include mass and rescue effects [37,38], source–sink dynamics [39,40] and spatial insurance [41,42]. At the core of each of these principles is the underlying structure describing the movement of species between spatial locations. More recent work has identified ‘keystone patches’—patches that contribute strongly to diversity or stability of the entire system [43]. Finally, the management of spatially distributed resources has been a topic in the field of economics [44]. Though much of the literature focuses on fisheries management [9,44,45], applications exist for the spatial management of other renewable resources [46] and invasive species [47].

While we focus the application of our method on spatial networks, it is by no means limited to them. Our method is suitable for generating matrices of species interactions such as competition, parasitism or mutualism. In ecology, one common measure of ecosystem stability is the dominant eigenvalue of the interaction matrix or ‘asymptotic resilience’ [48]. A negative (positive) value indicates a stable (unstable) system, with the magnitude being a loose indicator of how stable the system is. Recent advances have demonstrated that for random interaction matrices the dominant eigenvalue can be a poor indicator of stability compared to other metrics [49]. It would be useful to explicitly test under which species interaction structures the dominant eigenvalue is a good or poor indicator of stability.

Our method complements existing algorithms for generating networks (table 1) while giving researchers precision in the structure of networks they wish to construct. We hope that our algorithm will aid researchers in testing the effects of explicit network structure on model outcomes.

Data accessibility

We include source code for the algorithm in electronic supplementary material, B. Alternatively, source code and convergence data can be retrieved from the Open Science Framework (https://osf.io/4sd65).

Authors' contributions

D.W.S. commented and optimized the algorithm and drafted the manuscript. K.R.S. solved the bounding proofs for the algorithm. K.R.S. and J.A.B. conceived the study, designed the algorithm, and helped draft the manuscript. All authors gave final approval for publication.

Competing interests

We have no competing interests.

Funding

D.W.S. was supported by the TULIP Laboratory of Excellence (ANR-10-LABX-41) and by the BIOSTASES Advanced Grant, funded by the European Research Council under the European Union's Horizon 2020 Research and Innovation Programme (grant agreement no. 666971). K.R.S. was supported by the NSF Alliance for Building Faculty Diversity Postdoctoral Fellowship (NSF 733 grant no. DMS-0946431). J.A.B. was supported by the NSF Advanced Cyberinfrastructure (NSF grant no. 1639529).

Acknowledgements

We thank Matthieu Barbier, Yuval R. Zelnik, Michel Loreau and Bart Haegeman for stimulating discussions, and two anonymous reviewers for their contributions to improve the manuscript.

Footnotes

↵1 Source code and convergence data can be retrieved from the Open Science Framework (https://osf.io/4sd65).

↵2 This is in contrast with an unweighted network where the off-diagonal entries of the adjacency matrix are constrained to be one or zero—i.e. patches either are or are not connected. Unweighted networks often reflect the structural connections between patches. Weighted networks contain additional information such as distance between patches, patch size and quality, or the probability of dispersal between patches [3,24]. In our case, the weights correspond to the cost of movement between patches. By setting a threshold to determine if a pair of nodes are connected or not, one can convert weighted networks into unweighted ones [3,25].

↵3 Conditional on the model framework, the diagonals of the adjacency matrix do not necessarily need to equal zero. For example, if one was concerned with species interactions or unweighted spatial networks, then the diagonals of the adjacency matrix could correspond to the growth rate(s) of species.

↵4 Because we assume a weighted network, the off-diagonal of the adjacency matrix are composed of positive real numbers. Therefore, we are guaranteed at least one real, positive eigenvalue where the elements in the associated eigenvector are strictly positive [27]. Alternatively, one may use other derivations of spectral radius, which are equivalent for weighted and unweighted networks [28].

↵5 For some network with a given number of patches (n) and spectral radius (r*), it follows that bounds for the variance and skewness of the associated eigenvector are given by:
v∗≤1n−[(n−1n)(wminr∗)]2−3g(v)v∗−[g(v)]3(v∗)3/2<s∗<(1/n)−3g(v)v∗−[g(v)]3(v∗)3/2
where wmin is the user-defined minimum value of an entry of A0. The function g(v) represents the mean of the dominant eigenvector, which exhibits a one-to-one relationship with the variance where g(v)=((1/n)−v∗)1/2 [28]. These inequalities represent ‘weak’ bounds for v* and s*. Proofs for these inequalities are found in electronic supplementary material, A. Examples of available network configurations generated using our method can be found in figure 2; electronic supplementary material, A.

↵6 The sum of squared differences is a well-known method to compare theoretical and predicted outcomes in ecology and statistics. Further, because it is squared, it guarantees concavity.

↵7 The algorithm was coded in Matlab for applicability and readability to a broad audience. Computational efficiency could be increased by coding the algorithm in another platform such as C++ or Python, allowing for the generation of larger networks. Explicitly accounting for the assumptions of the desired adjacency matrix will also improve computation efficiency. For example, by imposing symmetry in (3.2) one reduces the number of unknown variables by half.

↵8 Alternatively, one could generate a weighted network with specific properties and convert it to an unweighted network by assigning a weight threshold that defines the presence or absence of a connection [3,25].

Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.