Time and Date: 16:15 - 18:00 on 19th Sep 2016

Room: E - Mendes da Costa kamer

Chair: Andrea Nanetti

Abstract: We live in a global village where electronic communication has eliminated the geographical barriers of information exchange. The road is now open to worldwide convergence of information interests, shared values and understanding. Nevertheless, interests still vary between countries around the world. This raises important questions about what today’s world map of information interests actually looks like and what factors cause the barriers of information exchange between countries. To quantitatively construct a world map of information interests, we devise a scalable statistical model that identifies countries with similar information interests and measures the countries’ bilateral similarities. From the similarities we connect countries in a global network and find that countries can be mapped into 18 clusters with similar information interests. Through regression we find that language and religion best explain the strength of the bilateral ties and formation of clusters. Our findings provide a quantitative basis for further studies to better understand the complex interplay between shared interests and conflict on a global scale. The methodology can also be extended to track changes over time and capture important trends in global information exchange.
References:
- Karimi, Bohlin, Samoilenko, Rosvall and Lancichinetti. Mapping bilateral information interests using the activity of Wikipedia editors, Palgrave Communications 1 (2015)
- Samoilenk, Karimi, Edler, Kunegis and Strohmaier. Linguistic neighbourhoods: explaining cultural borders on Wikipedia through multilingual co-editing activity, EPJ Data Science 5 (2016)

Connecting tweeting behavior to voting activity in the European Parliament: A study of cohesion and coalitions.
[abstract]

Abstract: Social media activities often reflect phenomena that naturally occur in other complex systems. By observing the networks and the content propagated through these networks, we can describe or even predict the processes that influence the observed social media activities.
In this study, we explore the connection between the voting and retweeting (endorsing) behavior of the members of the 8th European Parliament (MEPs).
Utilizing roll-call vote data between October 2014 and February 2016, we investigate the processes responsible for the formation of coalitions in the European Parliament (EP). We study the formation of coalitions at different levels of granularity. First, we focus on political groups in the EP and quantify the voting agreement between their members (cohesion) and explore the inter-group voting agreement employing a distance based method.
Second, we investigate which role party affiliation, origin of MEPs as well as the topic of the ballot play for the covoting behavior in the European Parliament. Using Exponential Random Models in combination with a meta-analysis technique, we show that the cohesion of political groups and strength of alliances between these groups depend to a large extent on the topic of the ballot.
We study the retweet network between the MEPs, as observed in the same time period. Again, we define the intra- and inter-group measures of cohesiveness, and investigate the retweeting behavior of MEPs across topics. We compare the results with the results based on the roll-call votes and show that retweeting behavior of MEPs across several topics exhibits explanatory power for the covoting behavior of MEPs.
References:
Cherepnalkoski, Mozetič. A Retweet Network Analysis of the European Parliament. In Proceedings of SITIS 2015. IEEE. p.350-357.
Lubbers. Group composition and network structure in school classes: a multilevel application of the p model. Social Networks. 2003, 25(4):309-332

Abstract: Citation networks have been widely used to study the evolution of science through the lenses of the underlying patterns of knowledge flows among academic papers, authors, and research sub-fields. Here we focus on citation networks to cast light on the salience of homophily, namely the principle that similarity breeds connection, for knowledge transfer between papers. To this end, we assess the degree to which citations tend to occur between papers that are concerned with seemingly related topics or research problems. Drawing on a large data set of articles published in the journals of the American Physical Society, we propose a novel method for measuring the similarity between articles through the statistical validation of the overlap between their bibliographies. We define the probability P_{i\to j}(p*) that a citation between any two articles i and j whose similarity is validated at the threshold p* exists as the ratio between the number of pairs of articles validated at that threshold in the APS citation network and the number of existing citations between those validated pairs. Results suggest that the probability of a citation made by one article to another is an increasing function of the similarity between the two articles. Our study enables us to uncover missing citations between pairs of highly related articles, and may help identify barriers to effective knowledge flows. By quantifying the proportion of missing citations, we conduct a comparative assessment of distinct journals and research sub-fields in terms of their ability to facilitate or impede the dissemination of knowledge. Findings indicate that knowledge transfer is facilitated by journals of wide visibility, such as PRL, than by lower-impact ones. Our study has important implications for authors, editors and reviewers of scientific journals, as well as public preprint repositories, as it provides a procedure for recommending relevant yet missing references.

Abstract: The emergence of the human language is perhaps the most significant event in the course of our evolution. Unlike primitive forms of animal communication, the vastness of our lexicon and the recursive application of structural rules allows human language almost infinite creative potential in producing meaningful utterances. Despite the complexity and variety of language, the processes of generating and deciphering meaningful utterances are accomplished remarkably rapidly. This paper demonstrates that these efficiencies are not accidental – lexicon and syntax in human language are organized purposefully in networks with maximized navigational performance.
Starting from a word-to-word co-occurrence network of a corpus, we adopt an unsupervised learning algorithm to identify coherent sub-sequences (motifs) of nodes (words) shared by many paths (sentences). We believe that instead of navigating the language network word by word when one generates a sentence, motifs are employed as functional shortcuts to accelerate the process i.e. from the network perspective, motifs connect words that were previously far apart. We show that these functional motifs reduce the effective path lengths between words much more efficiently than motifs created from a null model. We also establish the importance of stop words (highly frequent words with low semantic value) in motifs. A large proportion of motifs can be characterized by a relatively small set of stop word templates. Moreover, these same templates are reused recursively to embed the language network to higher levels of abstraction.
Our findings are surprisingly consistent with the theory of Construction Grammar (CxG). CxG purports that constructions are irreducible components of language carrying both form and meaning within. Words, sequences of words, and even templates of phrase formation are all treated equally as constructions in this paradigm. Similarly, in our language network, we allow words, phrases, and generalizable templates to coexist and interact.

Abstract: In network theory, homophily is a tendency to connections between nodes of similar characteristics. Social networks, such as friendship networks, tend to be homophilious, since they connect individuals of similar tastes or opinions. The effect of homophily and information diffusion in social networks are difficult to distinguish in empirical studies: an homophilious network might display a behavior similar to diffusion through word-of-mouth just because agents with similar characteristics are likely to both adopt similar things and be connected to each other. The objective of this study is to analyze the effect of homophily in diffusion by word-of-mouth and to compare it with a non-homophilious benchmark.
We introduce homophily in a percolation model of word-of-mouth diffusion as a modification of the small world algorithm. This novel algorithm reorganizes the nodes according to their individual characteristics, so the resulting network is highly homophilious. A comparison between diffusion in the modified network and in the benchmark scenario allows to isolate the effect of homophily in adoption.
The main result is that homophily reduces the effect of the network structure: homophilious networks with different link structures present almost identical adoption sizes. In other words, the diffusion size does not differ substantially for different values of the rewiring probability. This is equivalent to saying that the network structure of a population does not play much of a role in this context.
This effect results from the extreme case of homophily considered. Nonetheless, this novel approach to introduce homophily in a social network allows for an intuitive development of the word-of-mouth diffusion process in homophilious networks. The prevalence of homophily in social networks calls for studies that introduce homophily in simulation models of diffusion such as this.

Abstract: Risk assessment and management of an ICT infrastructure requires a large amount of data on the joint behavior of the system and its users. Usually, data is collected over time by observing such a joint behavior after the deployment. This implies that the design step cannot assess and manage because of lack of data. As a consequence, risk can be assessed after the deployment only.
Monte Carlo Ecology is a methodology to predict the behavior of a system under attacks at any step of its life. The methodology introduces an ecosystem that includes an environment that models the target system and some organisms that models the various agents that interact among themselves and with the system. Some organisms attack the environment and other ones update it to improve its resiliency. The interactions in the ecosystem determine its evolution and events of interest such as an agent that reach one of its predefined goals. An evolution is stochastic because several events of interest, such as the success of an attacks, are ruled by probability distributions. For this reason, Monte Carlo Ecology applies a Monte Carlo method that runs multiple, independent evolutions to build a statistical sample to assess the system resiliency in an ecosystem. Each evolution generates some data to assess a system.
We have developed the Haruspex suite to support Monte Carlo Ecology. Some of the suite tools build the models of the target system and of the agents in an ecosystem. Other tools implement the Monte Carlo approach and return the sample to assess the system. To fully exploit this sample, we have defined the security stress, a synthetic measure of resiliency in an environment. After describing the models of the system and of the agents, the full paper will detail the proposed methodology and present a case study.