Thank you for visiting nature.com. You are using a browser version with
limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off
compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site
without styles and JavaScript.

Subjects

Abstract

In the past 15 years, statistical physics has been successful as a framework for modelling complex networks. On the theoretical side, this approach has unveiled a variety of physical phenomena, such as the emergence of mixed distributions and ensemble non-equivalence, that are observed in heterogeneous networks but not in homogeneous systems. At the same time, thanks to the deep connection between the principle of maximum entropy and information theory, statistical physics has led to the definition of null models for networks that reproduce features of real-world systems but that are otherwise as random as possible. We review here the statistical physics approach and the null models for complex networks, focusing in particular on analytical frameworks that reproduce local network features. We show how these models have been used to detect statistically significant structural patterns in real-world networks and to reconstruct the network structure in cases of incomplete information. We further survey the statistical physics models that reproduce more complex, semilocal network features using Markov chain Monte Carlo sampling, as well as models of generalized network structures, such as multiplex networks, interacting networks and simplicial complexes.

Key points

Statistical physics is a powerful framework to explain properties of complex networks, modelled as systems of heterogeneous entities whose degrees of freedom are their interactions rather than their states.

The statistical physics of complex networks has brought theoretical insights into physical phenomena that are different in heterogeneous networks than in homogeneous systems.

From an applied perspective, statistical physics defines null models for real-world networks that reproduce local features but are otherwise as random as possible.

These models have been used, on the one hand, to detect statistically significant patterns in real-world networks and, on the other, to infer the network structure when information is incomplete.

These applications are particularly useful in the current information age to make consistent inference from huge streams of continuously produced, high-dimensional, noisy data.

The statistical mechanics approach has also been extended using numerical techniques to reproduce semilocal network features and, more recently, to encompass structures such as multilayer networks and simplicial complexes.

Introduction

The science of networks has exploded in the information age, thanks to the unprecedented production and storage of data on almost all human activities. This is because networks are a simple yet effective way to model a large class of technological, social, economic and biological systems that can be described as a set of entities (nodes) with interactions between them (links). These interactions represent the fundamental degrees of freedom of the network and can be of different types — undirected or directed, binary or valued (weighted) — depending on the nature of the system and the resolution used to describe it.

Notably, most of the networks observed in the real world fall within the domain of complex systems, because they exhibit strong and complicated interaction patterns and feature collective emergent phenomena that do not follow trivially from the behaviours of the individual entities1. For instance, many networks are scale-free2, meaning that the number of links incident to a node (the node’s degree) follows a power-law distribution, so that most nodes have few links, but a few of them (the hubs) are highly connected. The distribution of the total weight of connections incident to a node (the node’s strength) likewise follows a power law in many cases3,4. In addition, most real-world networks are organized into modules or display a community structure5,6, and they possess a high level of clustering, because nodes tend to create tightly linked groups. However, real-world networks are also small worlds7,8,9, that is, the mean distance (in terms of the number of connections) between two nodes scales logarithmically with the system size. The observation of these universal features in complex networks has stimulated the development of a unifying mathematical language to model their structure and understand the dynamical processes taking place on them, such as the flow of traffic on the Internet or the spreading of either diseases or information in a population10,11,12.

Two different approaches to network modelling can be pursued. The first one consists of identifying one or more microscopic mechanisms driving the formation of the network and using them to define a dynamic model that can reproduce some of the emergent properties of real systems. The small-world model7, the preferential attachment model2, the fitness model13,14,15, the relevance model16 and many others follow this approach, which is akin to kinetic theory. These models can handle only simple microscopic dynamics and thus, although they provide good physical insights, they need refinement to give quantitatively accurate predictions.

The other approach consists of identifying a set of characteristic static properties of real systems and then building networks that have the same properties but are otherwise maximally random. This approach is akin to statistical mechanics and therefore is based on rigorous probabilistic arguments that can lead to accurate and reliable predictions. The mathematical framework is that of the exponential random graph (ERG) model, which was first introduced in the social sciences and statistics17,18,19,20,21,22,23,24,25 as a convenient formulation that relies on numerical techniques, such as Markov chain Monte Carlo (MCMC) algorithms. The interpretation of the ERG model in physical terms is due to Park and Newman26, who showed how to derive the ERG model from the principle of maximum entropy and the statistical mechanics of Boltzmann and Gibbs.

As formulated by Jaynes27, the variational principle of maximum entropy states that the probability distribution that best represents the current state of knowledge of a system is the one that maximizes the Shannon entropy, subject, in principle, to any prior information on the system itself. This means making self-consistent inference while assuming maximal ignorance about the unknown degrees of freedom of the system28. The maximum entropy principle is conceptually powerful and finds numerous applications in physics and in science in general29. In the context of network theory, the maximum entropy approach is used to obtain ensembles of random graphs with given aggregated macroscopic or mesoscopic properties. These ensembles have two related applications. On the one hand, when the microscopic configuration of a real network is not accessible, the random graph ensemble describes the most probable network configuration: as is the case in traditional statistical mechanics, the maximum entropy principle makes it possible to gain maximally unbiased information in the absence of complete knowledge. On the other hand, when the microscopic configuration of the network is known, the ensemble defines a null model that enables assessment of the significance of empirical patterns found in the network against the null hypothesis that the network structure is determined solely by its aggregated structural properties.

This Review presents theoretical developments and empirical applications for the statistical physics of real-world complex networks. We start by introducing the general mathematical formalism and, then, we focus on the analytic models obtained by imposing mesoscopic, that is, local, constraints, highlighting the novel physical concepts that can be learned from such models. After that, we present the two main fields of application for these models: detection of statistically significant patterns in empirical networks and reconstruction of network structures from partial information. We then discuss the models obtained by imposing semilocal network features and, finally, the most recent developments on generalized network structures and simplices.

Statistical mechanics of networks

Exponential random graphs

The statistical physics approach defined by the ERG consists of modelling a network system G* using an ensemble Ω of graphs with the same number N of nodes and type of links as G* (Fig. 1). The model is specified by P(G), the occurrence probability of a graph G∈ Ω. According to statistical mechanics and information theory27,30, the probability distribution that gives the most unbiased expectation of the microscopic configuration of the system under study is the one maximizing the Shannon entropy.

Fig. 1: Construction of the microcanonical and canonical ensemble of networks from local constraints.

The real network G* is the source of constraints on the ensemble. In this case, the constraints are the degrees k* of G*. The microcanonical approach relies on the link rewiring method to numerically generate several network configurations, each with exactly the same degree sequence of G*. The probability P(G) of a network G in the ensemble is non-zero only for the subset of graphs that realize the enforced constraints exactly, as indicated by the schematic probability distribution. Provided the sampling is unbiased, P(G) is uniform for these graphs. The canonical approach obtains P(G) by maximizing the Shannon entropy S while constraining the expected degree values within the ensemble and then maximizing the likelihood \({\mathscr{L}}\) of P(G*) to find the ensemble parameters θ*, such that the expectation values of the degrees match the observations in G*. Thus, P(G) is non-zero for any graph, ranging from the empty to the complete one, as indicated by the schematic probability distribution.

This maximization is subject to the normalization condition \({\sum }_{{\boldsymbol{G}}\in \Omega }P({\boldsymbol{G}})=1\) and a collection of constraints c* that represent the macroscopic properties enforced on the system. These constraints define the sufficient statistics of the problem, that is, the model parameters depend only on the values of the constraints.

Imposing hard constraints, that is, assigning a uniform P(G) to each of the graphs G that satisfy c(G) = c* and zero probability to graphs that do not, leads to the microcanonical ensemble. Typically, this ensemble is not amenable to analytical treatment beyond steepest descent approximations31 and is thus sampled numerically (see Box 1).

The canonical ensemble is instead obtained by imposing soft constraints, that is, by fixing the expected values of the constraints over the ensemble, \({\sum }_{{\boldsymbol{G}}\in \Omega }{\boldsymbol{c}}({\boldsymbol{G}})P({\boldsymbol{G}})={{\boldsymbol{c}}}^{* }\). Introducing the set of Lagrange multipliers θ, the constrained entropy maximization returns

where \(H({\boldsymbol{G}},{\boldsymbol{\theta }})={\boldsymbol{\theta }}\cdot {\boldsymbol{c}}({\boldsymbol{G}})\) is the Hamiltonian, \(Z({\boldsymbol{\theta }})\)\(={\sum }_{{\boldsymbol{G}}\in \Omega }{{\rm{e}}}^{-H({\boldsymbol{G}},{\boldsymbol{\theta }})}\) is the partition function and · represents the scalar product. Thus, the canonical \(P\left({\boldsymbol{G}}| {\boldsymbol{\theta }}\right)\) depends on G only through c(G), which implies that graphs with constraints of the same value have equal probability. This means that the canonical ensemble is maximally non-committal with respect to the properties that are not enforced on the system32.

Remarkably, for models of networks with an extensive number of constraints, the microcanonical and canonical ensembles turn out to be inequivalent in the thermodynamic limit N→∞33,34,35. This is in contrast to the case of traditional statistical physics (except possibly at phase transitions), in which there is typically a finite number of constraints, such as total energy and total number of particles. In this sense, complex networks not only provide an important domain of application for statistical physics but can also expand our fundamental understanding of statistical physics itself and lead to new theoretical insights. Additionally, from a practical point of view, the breaking of ensemble equivalence in networks implies that the choice between microcanonical and canonical ensembles cannot be based solely on mathematical convenience, as is usually done, but rather should follow from a principled criterion. In particular, because the canonical ensemble describes systems subject to statistical fluctuations, it is the more appropriate ensemble to use when the observed values of the constraints can be affected by measurement errors, missing and spurious data or simply stochastic noise. Fortunately, as we shall see, the canonical ensemble is more analytically tractable than the microcanonical ensemble.

The definition of the canonical ensemble in equation 2 specifies the functional form of \(P\left({\boldsymbol{G}}| {\boldsymbol{\theta }}\right)\) but leaves the Lagrange multipliers as parameters to be determined by the equations describing the constraints, \({\sum }_{{\boldsymbol{G}}\in \Omega }{\boldsymbol{c}}({\boldsymbol{G}})P({\boldsymbol{G}}| {\boldsymbol{\theta }})={{\boldsymbol{c}}}^{* }\). In practical applications, the average values of the constraints are seldom available, but one possible strategy is to draw the Lagrange multipliers from probability densities chosen to induce archetypal classes of networks, such as regular graphs, scale-free networks and so on26,31,36. When instead the task is to fit the model to the observations \({{\boldsymbol{c}}}^{* }\equiv {\boldsymbol{c}}({{\boldsymbol{G}}}^{* })\) for a given empirical network G*, the optimal choice of the values θ* is those that maximize the likelihood functional37,38.

This procedure results in a match between the ensemble average and the observed value of each constraint: \({\sum }_{{\boldsymbol{G}}\in \Omega }{\boldsymbol{c}}({\boldsymbol{G}})P\left({\boldsymbol{G}}| {{\boldsymbol{\theta }}}^{* }\right)\equiv {\boldsymbol{c}}\left({{\boldsymbol{G}}}^{* }\right)\).

Box 1 Alternative ensemble constructions

Various methods have been proposed to define ensembles of graphs with local constraints, as alternatives to maximum entropy. Here we briefly present them for the case of binary undirected graphs, which is by far the simplest and most frequently explored situation.

Computational methods explicitly generate several random networks with the desired degree sequence. The bottom-up approach initially assigns to each node a number of link stubs equal to its target degree, and pairs of stubs are then randomly matched without allowing the formation of self-loops and multilinks157,158,159. Unfortunately, this procedure very often gets stuck in configurations in which nodes requiring additional connections have no more eligible partners, unacceptably leading to many sample rejections45. The top-down approach instead starts from a realized network and generates a set of randomized variants by iteratively applying a link rewiring algorithm that preserves the degree distribution42,58,160,161. The drawback of the top-down approach is that the number of rewirings needed to generate a single configuration is very large and not rigorously specified162. Additionally, the algorithm may fail to sample the ensemble uniformly unless it employs a rewiring acceptance probability that depends on the current network configuration126,127,128,163. Other computational methods rely on theorems that set necessary and sufficient conditions for a degree sequence to be graphic, that is, realized by at least one graph, and exploit such conditions to define biased sampling algorithms and sampling probabilities164,165,166. These approaches are, however, rather costly, especially for highly heterogeneous networks (see ref.38 and references therein).

Analytic methods instead define explicit expressions for expected values of network properties as a function of the imposed constraints. A standard approach relies on the generating function \(g(z)={\sum }_{k}{z}^{k}P(k)\) for the degree distribution P(k)157,167. The generating function is rigorously defined only for infinite and locally tree-like networks, although it often works surprisingly well for real networks168. A popular alternative approach is based on the explicit expression for the connection probability between any two nodes i and j in the randomized ensemble, \({p}_{ij}=\frac{1}{2}{k}_{i}^{* }{k}_{j}^{* }{\rm{/}}{E}^{* }\), where E* is the total number of links67. This model defines systems with self-loops as well as multilinks37,40,43. Indeed, because pij may exceed 1 for pairs of high-degree nodes, the model requires that i and j should be connected by more than one link to realize the imposed constraints. The model predicts multilinks with non-negligible frequency in scale-free networks with P(k) ≈ k−γ for which the natural cut-off ~N1/(γ−1) is larger than the structural cut-off (~N1/2 for uncorrelated networks)169,170, where N is the number of nodes in the network and γ is the exponent of the power-law.

Imposing local constraints

Unlike most alternative approaches (briefly outlined in Box 1), the maximum entropy method is general and works for networks regardless of size, directedness, weighting, density, clustering and other properties. However, determining the occurrence probability of a graph in the ensemble can be a challenging task. In a handful of cases, an analytical calculation of the partition function is feasible, so expectation values and higher moments of any quantity in the ensemble can be analytically derived. As in conventional equilibrium statistical mechanics, whether such an analytical computation is possible depends on the particular constraints imposed. Analytical computation of the partition function is possible in very simple models, such as the well-known Erdös–Rényi random graph39, as well as when the constraints are the degrees k* and strengths s* that describe the local network structure from the viewpoint of each node26. In fact, imposing local constraints separately for each node is a minimum requirement to construct ERG ensembles that are both theoretically sound and practically useful, in the sense that they accurately replicate the observed heterogeneity of real-world networks. This is because degree and strength distributions in many real-world systems are scale-free — a property that distinguishes many networks from systems typically studied in physics, such as gases, liquids and lattices — and scale-free degree and strength distributions cannot be obtained from simple models with global constraints. For instance, the Erdös–Rényi model is obtained in the ERG formalism by constraining the expected total number of links, and this global constraint leads to an ensemble in which each pair of nodes is connected with fixed probability p, implying that the degree distribution follows a binomial law, rather than a power law. Local constraints in turn make the ERG model analytical because of the independence of dyads: P(G) factorizes into link-specific terms, whose contribution to the partition function sum can be evaluated independently of the rest of the network. Note that local constraints lie at the mesoscopic level, between the microscopic degrees of freedom of the network (the individual links) and the macroscopic aggregation of all degrees of freedom into global quantities, such as the total number of links, which corresponds, for instance, to the total energy of the system in traditional statistical physics.

The ERG model obtained by constraining the degrees \({{\boldsymbol{c}}}^{* }\equiv {{\boldsymbol{k}}}^{* }\) is known as the (canonical) binary configuration model (BCM). In the simplest undirected case, the entropy maximization procedure returns an ensemble connection probability between any two nodes i and j:

The weighted configuration model (WCM)40 is instead obtained by constraining the strengths \({{\boldsymbol{c}}}^{* }\equiv {{\boldsymbol{s}}}^{* }\). In the simpler undirected case and considering integer weights, the connection probability between any two nodes i and j is given by pij = yiyj, where y are the (exponentiated) Lagrange multipliers. The probability distribution and the ensemble average for the weight of the link between i and j (or, equivalently, for how many links are established between the two nodes) are qij(w)=(yiyj)w(1−yiyj) and

$$\langle {w}_{ij}\rangle =\frac{{y}_{i}{y}_{j}}{1-{y}_{i}{y}_{j}}$$

(5)

These two models recall traditional statistical mechanics for systems of non-interacting particles, if connections are interpreted as particles in a quantum gas and pairs of nodes as single-particle states. Indeed, in binary networks, each pair of nodes can be connected by at most one link, or equivalently, each single-particle state can be occupied by at most one particle. Therefore, for binary networks, equation 4 results in fermionic statistics. However, weighted networks correspond to particle systems for which single-particle states can be occupied by an arbitrary number of particles, so equation 5 describes a system of bosons. For these systems, Bose–Einstein condensation can occur between very strong nodes for which yiyj → 1 (ref.26).

Notably, a mixed Bose–Fermi statistics is obtained when degrees and strengths are imposed simultaneously36, as in the enhanced configuration model (ECM)41. In the simplest undirected case, using (exponentiated) Lagrange multipliers x and y, respectively, for degrees and strengths, one gets \({p}_{ij}\,=\,({x}_{i}{x}_{j}{y}_{i}{y}_{j}){\rm{/}}({x}_{i}{x}_{j}{y}_{i}{y}_{j}-{y}_{i}{y}_{j}\,+\,1)\) and \({q}_{ij}(w\, > \,0)\,=\,{p}_{ij}{({y}_{i}{y}_{j})}^{w-1}(\,1-{y}_{i}{y}_{j})\). Hence, the ECM differs from the WCM in the way the first link established between any two nodes is treated: the processes of creating a connection from scratch and of reinforcing an existing one obey intrinsically different rules. The connection creation process serves to satisfy the degree constraints, and the reinforcement process serves to fix the values of the strengths. Like the ensemble non-equivalence mentioned above, this mechanism and the resulting mixed statistics constitute new physical phenomena that the statistical physics approach to networks can unveil.

Pattern validation

Null models for networks

Validating models, that is, comparing their statistical properties with measurements of real-world systems, is an essential activity of theoretical physics. In the context of networks and complex systems, apart from looking for what a model is able to explain, much research has been devoted to identifying properties in real-world networks that deviate from a benchmark model42,43,44,45,46,47,48,49,50. This is because possible deviations are likely to contain important information about the unknown formation process of a network or one of its functions.

Maximum entropy models are perfectly suited for this task. Starting from a real network G*, they are used to derive the null hypothesis (that is, the benchmark model) using the set of properties c(G*) as constraints. Otherwise, no other information about the system is assumed. In other words, the null hypothesis is that the chosen constraints are the only explanatory variables for the network at hand. The other properties of G* can then be statistically tested and validated against this null hypothesis. For instance, a null model that is derived by imposing the total number of links as a (macroscopic) constraint is typically used to reject a homogeneity hypothesis for the degree distribution. Instead, when imposing local constraints, the aim is to check whether higher-order patterns of a real network (such as reciprocity, clustering, assortativity, motifs and so on) are statistically significant beyond what can be expected from the heterogeneity of the degrees or strengths themselves. Because an ensemble given by local constraints can be analytically characterized, the expectation values and standard deviations of most quantities of interest can be explicitly derived, therefore hypothesis testing based on standard scores can be easily performed38. When the ensemble distribution of the considered quantity is not normal, sampling of the configuration space using explicit formulas for P(G) makes it straightforward to perform statistical tests based on P values.

In the context of pattern validation, one of the most studied systems, which we also discuss here as an illustrative example, is the World Trade Web (WTW), which is the network of trade relationships between countries in the world51,52. The network exhibits disassortativity, in that countries with many trade partners are connected on average to countries with few partners53,54. This pattern is statistically explained to good approximation by the degree sequence38. An analogous situation also exists for the clustering coefficient38. These two observations exclude the presence of meaningful indirect economic interactions on top of the direct economic interactions in the WTW.

The situation changes, however, when the weighted version of the WTW network is analysed44,55,56. The network is still disassortative, in that countries with large export volumes are connected on average to countries with small export volumes. However, this pattern is not compatible with that of the WCM null model. Concerning the weighted clustering coefficient57, the agreement between the empirical network and the model is only partial. These findings point to the fact that, unlike in the binary case, knowledge of the strengths conveys only limited information about the higher-order weighted structure of the network. Together with the fact that basic topological properties, such as the link density, are not reproduced by the WCM (as discussed below), this suggests that even in weighted analyses, the binary structure plays an important role, irreducible to what local weighted properties can explain.

Network motifs and communities

Motifs58,59 are patterns of interconnections involving a small subset of nodes in the network, thus generalizing the concept of clustering. Typically, the null model used to study motifs in directed networks is obtained by constraining, in addition to the degrees, the number of reciprocal links per node60,61,62. Indeed, reciprocity is meaningful in many contexts. For instance, in food webs, the presence of bidirected predator–prey relations between two species strongly characterizes an ecosystem63. In interbank networks, the presence of mutual loans between two banks is a signature of trust between them, and in fact, the appearance of the motif corresponding to three banks involved in a circular lending loop with no reciprocation provides important early warnings of financial turmoil64.

Community structure instead refers to the presence of large groups of nodes that are densely connected within the group but sparsely connected to nodes outside the group6. Most methods to find communities in networks are based on optimization of a functional. The most prominent example of such a functional is the modularity5, which compares the number of links falling within and between groups with the expectation of such numbers under a null network model. The comparison with a null model is a fundamental step in the procedure, because even random graphs possess an intrinsic yet trivial community structure65,66. Indeed, in its original formulation, the modularity is defined on top of the configuration model by Chung and Lu67 (see Box 1). This model is fast and analytic but generates self-loops and multilinks, and thus it gives an accurate benchmark only when these events are rare, such as in very large and sparse networks. More generally, the maximum entropy approach (for example, the configuration model of equation 4), although more demanding from a practical viewpoint, can provide an appropriate null model that discounts the degree heterogeneity as well as other properties of the network38,68.

The ERG framework can be directly used to generate networks with a community structure by specifying the average number of links within and between each community. In this way, the ensemble becomes equivalent to the stochastic block model, in which each node is assigned to one of B blocks (communities) and links are independently drawn between pairs of nodes, with probabilities that are a function of only the block membership of the nodes69. This means that equation 4 becomes \({p}_{ij}\,={q}_{{b}_{i}{b}_{j}}\), where bi denotes the block membership of node i, or \({p}_{ij}\,=({x}_{i}{x}_{j}{q}_{{b}_{i}{b}_{j}}\,){\rm{/}}({x}_{i}{x}_{j}{q}_{{b}_{i}{b}_{j}}\,+\,1\,-\,{q}_{{b}_{i}{b}_{j}})\) for the degree-corrected block model70,71,72, in which the node degrees are also constrained.

Bipartite networks and one-mode projections

In bipartite networks, nodes can be divided into two disjoint sets, such that links exist only between nodes belonging to different sets73. Typical examples of these systems include affiliation networks, in which individuals are connected with the groups of which they are a member, and ownership networks, in which individuals are connected with the items they collected. The bipartite configuration model (BiCM)74 extends the BCM to this class of networks. The BiCM method has been used, for instance, to study the network of countries and products they export, that is, the bipartite representation of the WTW75,76, and to detect temporal variations related to the occurrence of global financial crises77. More recently, the BiCM has been applied to show that the degree sequence of interacting species in mutualistic ecological networks is sufficient to induce a certain amount of nestedness of the interactions78.

To directly show the structure of relationships among one of the two sets of nodes, a bipartite network can be compressed into its one-mode projection. This is a network containing only the nodes of the considered set, connected with weights that depend on how many common neighbours the nodes have in the other set79. The problem of building a statistically validated one-mode projection of a bipartite network is similar in spirit to that of extracting the backbone from standard weighted networks80,81,82,83.

The typical approach involves determining which links are significant using a threshold, which is either unconditional or dependent on the degree of nodes in the projected set84,85,86,87,88. However, unlike weighted networks, one-mode projections should be assessed against null models that constrain the local information of both sets of the original bipartite network. Unfortunately, these models are difficult to derive. For instance, the use of computational link rewiring methods for one-mode projections89,90 is even more impractical and biased91 than for standard networks. Other null models require multiple observations of the empirical network92.

The null model for bipartite network projections derived from the maximum entropy principle is instead obtained by one-mode projecting the BiCM, that is, by computing the expected distribution for the number of common neighbours between nodes on the same layer93,94 (Fig. 2). This null model has been used, for instance, to analyse the one-mode projection of the bipartite WTW. This analysis allows detection of modules of countries that have similar industrial systems, construction of a hierarchical structure of products94 and tracing of specializations that emerge from the baseline diversification strategy of countries95. The same null model has also been used to study the patterns of asset ownership by financial institutions. In this case, the analysis allows identification of significant portfolio overlaps that bear the highest risk of fire-sale liquidation and forecast of market crashes and bubbles93. More recently, a null model obtained by pairwise projection of multiple bipartite networks has been successfully applied to identify significant innovation patterns involving the interplay of scientific, technological and economic activities96.

Fig. 2: One-mode projection of the network of countries and products they export and its statistical validation against a null hypothesis derived from the BiCM.

In this example, the existence of a link between the US and Italy (red boxes in the networks) in the one-mode projection is determined. Step 1: the real-world bipartite network is compared with the bipartite configuration model (BiCM) ensemble (dashed box), which is derived by constraining the degrees of both countries and products. The ensemble gives a connection probability for any country–product pair analogous to that of equation 4. Step 2: for the pair of countries under consideration, the actual number of products they both export (red links on the real-world network) is assessed using the probability distribution obtained from the BiCM. In this example, the observed value is statistically significant, that is, the null hypothesis that the value is explained by the degrees of both countries and products is rejected, and thus a link connecting the two countries is placed in the one-mode validated projection. This procedure can be repeated for each pair of countries to obtain the projection on the country set of the real-world trade web. This projection identifies communities of countries that have similar industrial systems. Instead, the projection performed on the product set highlights classes of products that require similar technological capabilities. Adapted from ref.94, CC-BY-4.0.

Network reconstruction

The problem of partial information

Many dynamical processes of critical importance, such as disease spreading or information diffusion, are sensitive to the topology of the network of interactions on which they occur97. However, in many situations, the structure of the network is at least partially unknown. A classic example is that of financial networks. Financial institutions publicly disclose their aggregate exposures in their balance sheets, but individual exposures (who is lending how much to whom) remain confidential98,99,100. Another example is that of social networks, which are too large in scale to allow exhaustive crawling and for which only aggregate information is typically released owing to privacy issues101,102. For natural and biological networks, measuring all possible interactions is difficult because of technological limitations or high experimental costs103,104. Thus, reconstruction of the network structure when only limited information is available is a problem that is relevant across several domains and represents one of the major challenges for complexity science.

When the task is to predict individual missing connections in partially known networks, one talks about link prediction105. Here, we instead discuss the fundamentally different task of reconstructing a whole network from partial information on the system, aggregated at the mesoscopic and macroscopic level106. As with the other problems discussed in this Review, the key to success is to make optimal use of what is known about the system and to make the most unbiased estimate of what is not known. This is naturally achieved using techniques based on the maximum entropy principle: the probability distribution that best represents the current state of knowledge on the network is the one with the largest uncertainty that also satisfies the constraints corresponding to the available information. Note that the ERG approach has the additional advantage of defining not a single reconstructed network instance but instead an ensemble of plausible configurations with related probabilities. In this way, it can handle spurious or fluctuating data and obtain robust confidence intervals for the outcomes of a given dynamical process on the (unknown) network.

However, different kinds of local constraints lead to substantially different outcomes for the reconstruction process. For a variety of networks of different nature (such as economic, financial, social or ecological networks), constraining the degrees, as in the BCM, typically returns a satisfactory reconstruction of the binary network features. However, constraining the strengths, as in the WCM, almost always leads to a poor weighted reconstruction38,41. This is because the entropy maximization procedure is unbiased by not assuming any relationship between the strength of a node and the number of connections that node can establish. Hence, out of the many possible ways to redistribute the strength of each node over all possible links, the method chooses the most even one, so the probability of assigning zero weight to a link is extremely small; the reconstructed network becomes almost fully connected, regardless of the link density of the original network. This phenomenon shows that degrees and strengths carry different kinds of information and constrain the network in fundamentally different ways. To reconstruct sparse weighted networks, both are required47,49, as in the ECM41. This is quantitatively shown using information-theoretic criteria (see ref.41 and Box 2).

Box 2 Comparing models from different constraints

Competing models that result from different choices of constraints can be compared using various likelihood-based statistical criteria. This, in turn, means assessing the informativeness of different network properties. Consider two models, \({{\mathscr{M}}}_{1}\) and \({{\mathscr{M}}}_{2}\). When these models are nested, meaning that \({{\mathscr{M}}}_{2}\) contains extra parameters θ* with respect to \({{\mathscr{M}}}_{1}\) or equivalently that \({{\mathscr{M}}}_{1}\) is a special case of \({{\mathscr{M}}}_{2}\), a comparison can be made through the likelihood ratio test (LRT)171; if \({{\rm{LRT}}}_{{{\mathscr{M}}}_{1}/{{\mathscr{M}}}_{2}}=-2[{{\mathscr{L}}}_{{{\mathscr{M}}}_{1}}({{\boldsymbol{\theta }}}_{1}^{* })-{{\mathscr{L}}}_{{{\mathscr{M}}}_{2}}({{\boldsymbol{\theta }}}_{2}^{* })]\) is smaller than a given significance level, \({{\mathscr{M}}}_{2}\) should be rejected even though its likelihood is, by definition, higher than that of \({{\mathscr{M}}}_{1}\). This is because \({{\mathscr{M}}}_{2}\) overfits the data by using redundant parameters that have intercorrelations and that thus provide spurious information on the system172.

When \({{\mathscr{M}}}_{1}\) and \({{\mathscr{M}}}_{2}\) are not nested, the Akaike information criterion (AIC)173 ranks them in increasing order of \({{\rm{AIC}}}_{{{\mathscr{M}}}_{r}}=2[{K}_{{{\mathscr{M}}}_{r}}-{{\mathscr{L}}}_{{{\mathscr{M}}}_{r}}{\rm{(}}{{\boldsymbol{\theta }}}_{r}^{* }{\rm{)}}]\), where \({K}_{{{\mathscr{M}}}_{r}}\) is the number of parameters of model \({{\mathscr{M}}}_{r}\). For R competing models, Akaike weights \({\omega }_{{{\mathscr{M}}}_{r}}={{\rm{e}}}^{-({{\rm{AIC}}}_{{{\mathscr{M}}}_{r}}/2)}/{\sum }_{r=1}^{R}{{\rm{e}}}^{-({{\rm{AIC}}}_{{{\mathscr{M}}}_{r}}/2)}\) quantify the probability that each model is the most appropriate to describe the data174.

The Bayesian information criterion (BIC)175 is similar to the AIC but accounts for the number n of empirical observations as \({{\rm{BIC}}}_{{{\mathscr{M}}}_{r}}={K}_{{{\mathscr{M}}}_{r}}{\rm{ln}}n-2{{\mathscr{L}}}_{{{\mathscr{M}}}_{r}}({{\boldsymbol{\theta }}}_{r}^{* })\). Bayesian weights can be also defined, analogously to Akaike weights. The BIC is believed to be more restrictive than the AIC, but which criterion performs best and under which conditions is still debated (see (ref.172) and references therein).

We note that other model selection methods have also been proposed, such as multimodel inference, in which a sort of average over different models is performed172.

The fitness ansatz

Unfortunately, in many situations — and typically for financial networks — the node degrees are also unknown. A possible solution comes from the observation that in many real networks, the nodes can be allocated fitnesses, from which connection probabilities between nodes and node degrees can be obtained14,52,107,108,109. The strengths themselves work well as fitnesses in many cases44. The fitness ansatz assumes that the strength of a node is a monotonic function of the Lagrange multiplier x controlling the degree of that node. Assuming the simplest linear dependence s* ∝x (other functional forms are, in principle, allowed), equation 4 becomes110,111,112

The proportionality constant z can be easily found using maximum likelihood arguments together with a bootstrapping approach to assess the network density113, which relies on the hypothesis that subsets of the network are representative of the whole system, regardless of the specific portion that is observed. Node degrees estimated from the fitness ansatz are then used as inputs to the ECM to obtain the ensemble of reconstructed networks111 (Fig. 3). Alternatively, heuristic techniques can be employed to reduce the complexity of the method by replacing the construction of the ECM ensemble with a density-corrected gravity model that obtains weights as \({w}_{ij}\)∝\({z}^{-1}+{s}_{i}^{* }{s}_{j}^{* }\) with probability pij (ref.112). The method is also readily extended to bipartite networks114.

Fig. 3: Statistical reconstruction of an interbank network given by bilateral exchanges among banks.

The method takes as input the total exposure of each bank, that is, its total interbank assets A and liabilities L, obtained from publicly available balance sheets (upper panel), which correspond to the out-strength and in-strength s* of the node. Two entropy maximization steps are performed. The first step is to estimate the node degrees \(\widetilde{k}\), that is, the number of loans for each bank. These are unknown because the balance sheets list only the total amount of assets and liabilities. The estimate is performed using the fitness assumption that the Lagrange multiplier x controlling the out-degree and in-degree k* of a node scale with the total interbank assets and liabilities, respectively (middle panel), which results in a connection probability pij between nodes i and j of the form of equation 6. Node degrees can then be estimated using these probabilities. The second step reconstructs the full weighted topology of the network. One method for doing this is to use an enhanced configuration model with constraints on both the degrees \(\widetilde{k}\) estimated from the first step and empirical strengths s* (ref.111). Another method is to use a heuristic density-corrected gravity approach112 to place weights on realized connections. The outcome of the reconstruction process is a probability distribution over the space of networks compatible with the constraints (lower panel). Scatter plots adapted from ref.112, CC-BY-4.0.

Note that despite having only the strengths as the input, the reconstruction method described above is different from the WCM because it uses these strengths not to directly reconstruct the network but to estimate the degrees first and only then to build the maximum entropy ensemble. In this way, it can generate sparse and non-trivial topological structures and can be used to faithfully reconstruct complex network systems100,106.

Beyond local constraints

Approximate and numerical methods

Whether an ERG model is analytically tractable depends on whether a closed-form expression of its partition function Z can be derived. As we have seen, this is indeed possible in the case of local linear constraints, for which Z factorizes into link-specific terms. In some other cases, it may be possible to obtain approximate analytic solutions using a variety of techniques, such as mean-field theory, saddle-point approximation, diagrammatic perturbation theory and path integral representations. These situations have been explored in the literature and include the degree-correlated network model115, the reciprocity model and the two-star model116,117, the Strauss model of clustering118, models of social collaboration119, models of community structure31, hierarchical topologies120, models with spatial embedding121 and rich-club features122, and models constraining both the degree distribution and degree–degree correlations, which are known under the name of tailored random graphs123,124,125. If analytic approaches for computing Z are intractable, the ensemble can still be populated using Monte Carlo simulations. These simulations can be used either to explicitly sample the configuration space — taking care to avoid sampling biases by using ergodic Markov chains fulfilling detailed balance126,127,128 — or to derive approximate maximum likelihood estimators — taking care to avoid degenerate regions of the phase space, which often lead to trapping in local minima129,130,131,132,133,134,135. Such a variety of techniques is what makes the ERG model an extremely flexible and powerful framework for complex network modelling.

Markov chain Monte Carlo

In the MCMC method for ERG models, a new network \({{\boldsymbol{G}}}^{^{\prime} }\in \Omega \) is proposed by taking a network \({\boldsymbol{G}}\in \Omega \) and shuffling two links chosen at random. The shuffling is performed so as to preserve a given network property, such as the degree sequence of the network. The proposed network is accepted with the Metropolis–Hastings probability \({Q}_{{\boldsymbol{G}}\to {{\boldsymbol{G}}}^{^{\prime} }}={\rm{\min }}\{1,{{\rm{e}}}^{H({\boldsymbol{G}})-H({{\boldsymbol{G}}}^{^{\prime} })}\}\)136, where H is the Hamiltonian containing the Lagrange multipliers that take the role of inverse temperatures used for simulated annealing. The process is repeated from \({{\boldsymbol{G}}}^{^{\prime} }\) if the proposal is accepted and from G if the proposal is rejected. The moves fulfil ergodicity and detailed balance, and thus, for sufficiently long times, the values of the constraints in the sampled networks are distributed according to the prescription of the canonical ensemble.

However, the time to reach the correct distribution grows exponentially with system size, leading to failure of the MCMC method in practice. This happens whenever P(G) possesses more than one local maximum, which happens for instance when using ERG models to generate ensembles of networks with desired degree distribution, degree–degree correlations and clustering coefficient (within the so-called dk-series approach)137,138. Indeed, rewiring methods that are biased by aiming at a given level of clustering display strong hysteresis phenomena: cluster cores of highly interconnected nodes do emerge during the process, but once formed, they are very difficult to remove in realistic sampling timescales, leading to a breaking of ergodicity139. Multicanonical sampling has been proposed to overcome this issue of phase transitions140. The idea is to explore the original canonical ensembles without being restricted to the most probable regions, which requires sampling networks uniformly on a predefined range of constraint values. This is achieved using Metropolis–Hastings steps based on a microcanonical density of states estimated using the Wang–Landau algorithm141.

Generalized network structures

Networks of networks

Many complex systems are not simply isolated networks but are better represented by networks of networks142,143,144 (see Fig. 4). The simplest and most studied situation is the so-called multiplex (Fig. 4a), in which the same set of nodes interacts on several layers of networks. For example, in social networks, each individual has different kinds of social ties; in urban systems, locations can be connected by different means of transportation; and in financial markets, institutions can exchange different kinds of financial instruments.

Fig. 4: Generalized network structures.

a | A multiplex (or ‘link-coloured ’) network consists of various layers with the same set of nodes and specific interaction patterns in each layer. b | An interacting (or multilayer) network consists of various layers, each with its own nodes, with interactions existing within and between layers. c | A simplicial complex represents the different kinds of interactions (simplices) between groups of d nodes: isolated nodes (d = 0), pairwise interactions (d = 1, grey lines), triangles (d = 2, blue) and tetrahedra (d = 3, red).

Mathematically, a multiplex \(\overrightarrow{{\boldsymbol{G}}}\) is a system of N nodes and M layers of interactions. Each layer, labelled α = 1, …, M, consists of a network Gα. When modelling such a system, the simplest hypothesis is that the various layers are uncorrelated. In ERG terms, this means that the probability of the multiplex factorizes into the probabilities of each network layer: \(P(\overrightarrow{{\boldsymbol{G}}})={\prod }_{\alpha =1}^{M}{P}_{\alpha }({{\boldsymbol{G}}}^{{\boldsymbol{\alpha }}})\)145,146. This happens whenever the constraints imposed on the multiplex are linear combinations of constraints on individual network layers, which includes the situation in which local constraints are imposed separately for each network layer. For instance, imposing the degrees in each layer leads to M independent BCMs, one for each layer. The connection probability between any two nodes i and j in layer α reads \({p}_{ij}^{\alpha }=({x}_{i}^{\alpha }{x}_{j}^{\alpha }){\rm{/}}(1+{x}_{i}^{\alpha }{x}_{j}^{\alpha })\), where xα are layer-specific Lagrange multipliers. As a result, the existence of a link is independent of the presence of any other link in the multiplex. In this situation, the overlap of links between pairs of sparse network layers vanishes in the large N limit.

In more realistic models of correlated multiplexes, the existence of a link in one layer is correlated with the existence of a link in another layer. Such models can be generated by constraining the multilink structure of the system145. A multilink m is an M-dimensional binary vector m indicating a given pattern of connections between a generic pair of nodes in the various layers. The multidegree k(m) of a node in a given graph configuration is then the total number of other nodes with which the multilink m is realized. Constraining the multidegree sequence of the network, which requires imposing 2M constraints per node, leads to a probability for a multilink m between node i and node j given by \({p}_{ij}^{{\boldsymbol{m}}}=({x}_{i}^{{\boldsymbol{m}}}{x}_{j}^{{\boldsymbol{m}}}){\rm{/}}({\sum }_{{\boldsymbol{m}}}{x}_{i}^{{\boldsymbol{m}}}{x}_{j}^{{\boldsymbol{m}}})\). This method can be used to build systems made up of sparse layers with non-vanishing overlap.

For weighted multiplexes, it is simple to impose either the strengths alone or both degrees and strengths on each layer. Doing this leads to uncorrelated layers147. However, the situation is far more complicated if layers are correlated: no closed-form solution for \(P(\overrightarrow{{\boldsymbol{G}}})\) has been obtained to date, and thus a sampling procedure has been devised148. An interesting related case is provided by systems of aggregated multiplexes, which are simple networks given by the sum of the various layers of the multiplex. For aggregated multiplexes, constraining the aggregated local structure leads to the traditional canonical ensemble. For instance, constraining the sum of the weights of connections incident to each node in each layer leads to the standard WCM.

However, if the number of layers of the original multiplex is known, the model can be built using a layer-degeneracy term for each link, which counts the number of ways its weight can be split across the layers. Doing this leads to a WCM(M) model, for which the weight distribution is a negative binomial, with the geometric distribution of the standard WCM being the special case M = 1.

Furthermore, if it is known that links in each layer have a different nature, a good modelling framework is equivalent to that of multi-edge networks149,150, for which links belonging to different layers can be distinguished (Fig. 4b). To deal with this distinguishability, it is necessary to introduce another degeneracy term that counts all the network configurations giving rise to the same P(G). The solution of the model is then obtained using a mixed ensemble with a hard constraint for the total network weight and soft constraints for node strengths. This leads to Poisson statistics (independent of M) for link weights and \(\langle {w\rangle }_{ij}=M{y}_{i}{y}_{j}\)∝\({s}_{i}^{* }{s}_{j}^{* }\), a rather different situation than the outcome of either WCM or WCM(M)151.

We note that it can be tricky to decide which statistics to use in a specific case. For instance, mobility or origin–destination networks, consisting of number of trips between locations (nodes) aggregated over observation periods (layers), are better modelled by multi-edge networks, as long as each trip is distinguishable. For the aggregated WTW, the situation is less clear: whereas commodities are, in principle, distinguishable, trade transactions are much less so, and in fact, neither the WCM, WCM(M) or the multi-edge model can reproduce it well. A possible solution here is again to constrain both strengths and degrees simultaneously152.

Simplicial complexes

Simplicial complexes are generalized network structures that describe interactions between more than two nodes. They can be used to describe a wide variety of complex interacting systems, such as collaboration networks in which works result from two or more actors working together, protein interaction networks in which complexes often consist of more than two proteins, economic systems of financial transactions often involving several parties and social systems in which groups of people are united by common motives or interests. Simplicial complexes can involve any number of nodes. For instance, simplices of dimension d = 0, 1, 2 and 3 are nodes, links, triangles and tetrahedra, respectively, and d-dimensional simplices are their d-dimensional generalizations. The δ-dimensional faces of a d-dimensional simplex (δ < d) are all the simplices formed by subsets of δ + 1 nodes. A simplicial complex represents the different kinds of interactions within groups of d nodes. It is a collection of simplices of different dimensions that are connected such that every face of a simplex in the complex is in the complex and the intersection of every pair of simplices in the complex is a face of both simplices (Fig. 4c).

Exponential random simplicial complexes have been recently introduced as a higher-dimensional generalization of ERGs. They enable generation of random simplicial complexes in which each simplex has its own independent probability of appearance, conditioned to the presence of simplex boundaries, which become additional constraints on the model153. Explicit calculation of the partition function is possible when considering random simplicial complexes formed exclusively by d-dimensional simplices154. Indeed, by constraining the generalized degree (the number of d-dimensional simplices incident to a given δ-dimensional face), the graph probability becomes a product of marginal probabilities for individual d-dimensional simplices. Alternatively, an appropriate MCMC sampling scheme can be used to populate a microcanonical ensemble of simplicial complexes formed by simplices of any dimension155.

Perspectives and conclusion

Complex networks are different from the systems traditionally studied in equilibrium statistical physics in two key ways. One is that the microscopic degrees of freedom of networks are the interactions between the nodes making up the system and not the states of the nodes themselves. The other is that in real-world networks, nodes are typically heterogeneous, both in terms of intrinsic characteristics and connectivity features, such that networks cannot be assigned a typical scale. Maximum entropy models of networks based on local constraints are founded on these two facts, because they define probability distributions on the network connections and do not distinguish nodes beyond their local features. As we reviewed here, these models have found a wide range of practical applications. This is because they can often be analytically characterized; they are able to include higher-order network features, such as assortativity, clustering and community structure, using stochastic sampling; and they can model even more complex structures such as networks of networks and simplicial complexes.

However, there are limitations to the maximum entropy model approach. In the models discussed here, the constraints that can be imposed must be static topological properties of the network. Dynamical constraints have been considered only recently, using either the principle of maximum caliber — which is to dynamical pathways what the principle of maximum entropy is to equilibrium states29,156 — or by using definitions of the entropy functional alternative to Shannon’s definition (see Box 3). Another limitation of maximum entropy network models is that the possibility of considering semilocal network properties relies heavily on numerical sampling, which becomes unfeasible or strongly biased for non-trivial patterns involving more than two or three nodes. However, these patterns can be important in situations in which the network structure is determined by complex optimization principles, such as subunits of an electrical circuit, biochemical reactions in a cell or neuron firing patterns in the brain. The field of statistical physics of networks will need to face the challenge of developing more sophisticated network models for these kinds of structures. Nevertheless, maximum entropy models based on local constraints provide effective benchmarks to detect and validate such structures.

We remark that, from a practical point of view, the possibility of quantifying the relevance of a set of observed features and extracting meaningful information from huge streams of continuously produced, high-dimensional, noisy data is particularly relevant in the present era of big data. On the one hand, the details and facets of information that can now be extracted have reached levels never seen before, which means that increasingly complex data structures and models are needed in order to represent and comprehend them. On the other hand, the quantity of information available requires effective and scalable ways to let the signal emerge from the noise originating from the large variety of sources. The theoretical framework of statistical physics stands as an essential instrument to make consistent inference from data, whatever the level of complexity faced.

Box 3 Boltzmann, von Neumann and Kolmogorov entropies

Alternatives to Shannon entropy are used in statistical physics of networks for various purposes.

The Boltzmann entropy is the logarithm of the number of network configurations belonging to a microcanonical ensemble. Like the Shannon entropy for a canonical ensemble, the Boltzmann entropy can be used to quantify the complexity of the ensemble, that is, to assess the role of different constraints in terms of information they carry about the network structure31,120. Indeed, the more informative the constraints in shaping the network structure are, the smaller the effective number of graphs with the imposed features, and thus the lower the Boltzmann entropy of the corresponding ensemble. Using these arguments, it is possible to show that homogeneous degree distributions are the most likely outcome when the Boltzmann entropy of the microcanonical configuration model ensemble is large, whereas scale-free degree distributions naturally emerge for network ensembles with minimal Boltzmann entropy121.

The von Neumann entropy describes the amount of information encrypted in a quantum system composed of a mixture of pure quantum states33,176,177,178. It is given by −Tr(ρ lnρ), where ρ is the density matrix of the system and is equal to the Shannon entropy of the eigenvalue distribution of ρ. For undirected binary networks, a formulation of this entropy that satisfies the sub-additivity property can be given in terms of the combinatorial graph Laplacian L = diag(k) − G, by defining \({\boldsymbol{\rho }}={{\rm{e}}}^{-\tau {\boldsymbol{L}}}{\rm{/\; Tr}}({{\rm{e}}}^{-\tau {\boldsymbol{L}}})\)179. The resulting von Neuman entropy thus depends on the eigenvalue spectrum of the Laplacian and can be viewed as the result of constraining the properties of diffusion dynamics on the network180,181.

Finally, the Kolmogorov entropy generalizes the Shannon entropy by describing the rate at which a stochastic process generates information. Consider for simplicity a Markovian ergodic stochastic process described by the matrix \({\boldsymbol{\Phi }}=\{{\phi }_{ij}\}\) of transition rates {i→j} and by the stationary asymptotic probability distribution {πi}. In this case, the dynamical entropy is defined as \(-{\sum }_{ij}{\pi }_{i}{\phi }_{ij}{\rm{log}}{\phi }_{ij}\), that is, the average of the Shannon entropies of rows of \({\boldsymbol{\Phi }}\), each weighted by the stationary distribution of the corresponding node182. The Kolmogorov entropy is related, on the one hand, to the capacity of the network to withstand random structural changes182 and, on the other hand, to the Ricci curvature of the network183. This connection is particularly intriguing because the Ricci curvature has been used to differentiate stages of cancer from gene co-expression networks184, as well as to give hallmarks of financial crashes from stock correlation networks185.

Additional information

Publisher’s noteSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Park, J. & Newman, M. E. J. Statistical mechanics of networks. Phys. Rev. E70, 066117 (2004). In this paper, ERGs are interpreted for the first time as the statistical physics framework for complex networks.

Bianconi, G. The entropy of randomized network ensembles. Europhys. Lett.81, 28005 (2008). This paper derives the Boltzmann entropy of a variety of network ensembles to assess the role of structural network properties.