Thesis: 6 Concluding Remarks

We are living in an interconnected world where people can make use of technologies to expand their personal networks beyond the boundaries that existed just a decade ago. This has prompted interest in networks from a wide range of disciplines, such as sociology and psychology as well as statistical physics and mathematics. There has also been a surge in the development of a diverse range of methods that can be applied to the study of a variety of networks, from neural networks, to social networks. Most of these methods are only applicable to single snapshots of the binary network structure. This is a major limitation as, in most empirical network datasets, ties can be differentiated by attaching a weight to them and are formed, reinforced, weakened, and severed over time. By discarding the weights, the analysis is limited to the presence or absence of ties (Freeman, 1978). Moreover, by not knowing or recording the evolution of the network, the difficulty of modelling growth mechanisms increases (Snijders, 2002; Wasserman and Pattison, 1996).

The chapters within this thesis represent a step forward for the analysis of weighted and longitudinal networks. In Chapter 2, we proposed a generalisation of the clustering coefficient to weighted networks. The clustering coefficient examines the tendency of nodes to form triangles. Often when the coefficient is applied to a weighted network, the network is first made binary by using a subjective cut-off: ties with weights above the cut-off are set to present, whereas ties with weights below are removed. This reduces the richness of the data as some ties are removed and the remaining ones cannot be differentiated. Instead of changing the data, the clustering coefficient should be generalised to take into account weights. Chapter 2 was devoted precisely to develop a generalised coefficient.

The second project explored associations between prominence and control over the strongest ties in three real-world networks, namely the US airport network, a scientific collaboration network, and an online social network (Chapter 3). We build on the topological rich-club perspective that assesses whether the highly connected nodes (the prominent ones) form a club with more ties than expected by chance (Colizza et al., 2006). This framework was extending in three ways. First, we explored multiple definitions of prominence. This enabled us to detect novel and different results. Second, instead of limiting the analysis to the network topology, we examined whether the prominent nodes shared stronger ties than we would expect by chance. By exchanging the strongest ties, the prominent nodes secure control over the majority of resources flowing in the network. Third, in the rich-club framework, the coefficient obtained for the observed networks was compared to the average of coefficients found on a large number of random networks. When there are few prominent nodes, the coefficients obtained for the random networks might vary considerably. In fact, sometimes a striking result might be replicated in a non negligible number of the random networks. Therefore, we measured the 95% confidence interval of the coefficients found on the random networks. This enabled us to test whether or not the observed coefficient was replicated in a large proportion of the random networks. If the coefficient was above or below the interval, we argued that the prominent nodes preferentially directed their strongest ties towards or away from each other, respectively.

In Chapter 4, we offered a new approach to the study of the evolution of networks. Instead of limiting ourselves to conventional single snapshots of the network structure, we proposed to apply a regression framework often used in epidemiological studies to investigate the evolution of networks where the exact chronological order of ties is known. We applied this framework to an online social network and tested six growth mechanisms that might guide people’s communication choices. These were: triadic closure, preferential attachment, reciprocity, homophily, focus constraints, and reinforcement. Most of the results were in line with expectations; however, we found that the number of common contacts (triadic closure) was not a significant predictor of future ties in a multivariate analysis. This might be a reflection of the one-to-one online communication where an individual’s contacts do not observe each other as is the case in offline social settings. Moreover, we found that popularity (preferential attachment) was mitigated when tested with other mechanisms, and further mitigated when ties could be reinforced (i.e., when considering the weighted network). These findings have critical implications for understanding the structure and function of networks.

It is my hope and intention that the work within this thesis has an impact on the community of researchers interested in networks. Currently, there is a lack of software programmes that can handle weighted and longitudinal networks. Therefore, the fourth project was devoted to providing researchers with a platform called tnet for easily conducting an analysis of these types of networks. Functions to calculate the methods proposed in the previous chapters as well as others proposed in the literature (e.g., the generalisation by Newman, 2001c, of Freeman’s, 1978, closeness measure) have been programmed in the statistical software R. These functions have been incorporated into software package. Moreover, this package is now a publicly available open-source package. Other researchers will thus have the opportunity to easily implement generalisations of measures to weighted and longitudinal networks. In turn, this might prompt the development of interesting new measures.

The work within this thesis forms part of a wider research agenda concerned with the development of more sophisticated methods for analysing network data. Due to the fact that most of the existing methods are often defined for binary static networks only, it is necessary to reduce data so that it fits these methods. In this process, some of the richness contained within the data is removed. While there is abundance of network features that can be analysed in conventional binary static networks, this thesis covered only a few structural properties of weighted networks, and only one method for studying networks’ evolution. Thus, two areas of future research is concerned with the development of methods of studying weighted networks and methods of analysing networks where the exact sequence of ties is known.

In addition, there are other directions within this research agenda. First, directed networks have been an integral part of methods developed by sociologists. However, the same is not the case for most methods developed by physicists (Albert and Barabasi, 2002). Thus, future research is likely to be concerned with this issue as the two disciplines within network research converge (see Chapter 1 for more details on the different groups of network researchers). Second, two-mode networks, such as the scientific collaboration network used in Chapter 3, are often projected onto one-mode networks. This introduces a number of biases in the network, which might invalidate the results. For example, the clustering coefficient (Chaper 2) of the one-mode projection of a randomly reshuffled two-mode network is generally higher than the coefficient found in a random one-mode network with the same number of nodes and ties as the one-mode projection (Newman, 2001b). This is due to the fact that the two-mode structure introduces clusters in the one-mode projection. Although some elements of the two-mode structure can be maintained by creating a weighted one-mode network (Newman, 2001c, and Projecting Two mode Networks onto Weighted One mode Networks), it would be of interest to redefine the clustering coefficient for two-mode networks. In turn, this might yield a coefficient with less biases. Similarly, it might be more appropriate to reshuffle the two-mode structure before projecting it onto a one-mode network when applying the weighted rich-club effect as suggested in Section 3.2.1. Thus, two main areas of future research within this research agenda is involved with extending and developing methods for analysing datasets where ties are directed and where the two-mode structure is maintained.

Furthermore, there are an endless number of possible empirical applications of the methods proposed within this thesis. In fact, a key element within all the methods is generality and the ability for researchers to tune the methods. More specifically, conditioned on the context in which data is collected and defined, the clustering coefficient enables researchers to define triplet values in multiple ways, any prominence parameter and level can be used in the weighted rich-club effect, and the framework for analysing tie formation allows for multiple growth mechanisms to be included. As the topological rich-club effect has been applied to networks varying from the Italian interbank network (De Masi et al., 2006) to protein-protein interaction networks (Colizza et al., 2006), the methods proposed within this thesis can be applied to any network given the appropriate data being collected (i.e., tie weights for the weighted clustering coefficient and the weighted rich-club effect, and the exact sequence of ties for the evolution framework). In fact, the weighted rich-club has already formed part of a study on the world trade network (Zlatic et al., 2008).