Spatial Intuition, Network Layouts and Communities

Complex network visualization requires using strategy to map each node in a network to the 2D or 3D Euclidean space. This is called a spatial layout, and it is one if the major issues in Network Science. Layout algorithms must preserve the network topology and inner structure yet still show them in a clear and intuitive way. They include criteria such as the minimization of the link lengths while preventing node overlapping. The human perception of different layouts algorithms and their effective contribution is a hot topic in research nowadays[1].

Some layouts are based on physical models, such as the so-called Force Directed Layouts. They set the node positions by solving the dynamics of a force system between all elements in the network. One of the most popular algorithms in this field is the Fruchterman-Reingold algorithm, which substitutes the nodes by positive charged particles and the links by springs. In this way, all nodes repel each other, while linked nodes tend to stay close. It results in a layout with well separated clusters of nodes with an aesthetic disposition (Figure 1).

Figure 1: Network layout using the Fruchterman-Reingold algorithm. The details on the right hand side of the image show the clusterization of the network around certain central nodes.

Apparently unrelated to spatial layouts, another major subject in network analysis is the detection of communities. According to M. Newman, the goal of community detection is to find the natural divisions of a network into groups of nodes such that there are many edges within groups and few edges between them [2]. Community detection is the equivalent to clustering in usual datasets.

In this post we would like to connect the community detection methods with a visual validation obtained with a spatial layout. We take a social network formed by 29040 nodes and 118302 links and partition it with the well-known Louvain Community detection algorithm [3]. It is an agglomerative method that tries to maximize the modularity of the network partition [2], and it is widely used because of its performance and good results. Modularity maximization is a very hard computational task because it is an NP-problem. Thus, there are several approaches including the Louvain method, but all of them present disadvantages such as a degeneracy problem and often only find approximate solutions. The algorithms usually accept a resolution parameter (t) that can be varied to modify the number of communities detected in a network (i.e. the granularity).

In Figure 2 we show the results of the Louvain method applied to our test network (using the standard resolution parameter t=1 ), coloring the nodes according to the community they belong to. In this case, 69 different communities are identified. We observe that, although the partition is definitely acceptable, there is still structure inside each community. Indeed it looks like there are smaller communities contained in each one, defining a kind of hierarchy.

Figure 2: Community detection for t=1. Node color indicates the community label. We indicate parts of the network that can be considered as sub-communities inside larger communities.

If we are interested in another partition with smaller communities, we can run the Louvain algorithm again changing the resolution parameter to (say) t=0.1. In this situation we get 143 smaller communities, which are shown in Figure 3.

Figure 3: Community detection for t=0.1. More communities appear compared with Figure 3. Some of the former big communities have been divided in two or more smaller communities.

The capacity of roughly choosing the number of communities is particularly valuable in business analytics. Just think for a moment that you would like to start a personalized marketing campaign in the social network. Since each community is well differentiated, different communities demands different marketing strategies. But it’s not necessarily financially viable to have 69 marketing strategies! The solution would be to find new partitions of the network increasing the resolution parameter until you get the number of communities you can afford in your campaign.

In a nutshell, we have seen that a spatial layout can provide meaningful information on the structure of our network. It is important to notice that the layout does not really give quantitative information, only qualitative features can be extracted and the feeling and intuition of the analyst are critical. So, thanks to the combination of layouts, visualization techniques and community detection methods, we can achieve the network partition that best fits our project.