Abstract

Network analysis literature counts plenty of models of different paradigms designed for solving the link prediction problem in complex information networks. However, fewer studies that have exploited link strength-related social theories for this purpose even in a social context. In this paper, the authors introduce a new approach to solve the link prediction problem in scientific bibliographic networks. The aim is to predict future collaboration relations between scientists relying upon the “strength of strong ties” hypothesis. The proposed model estimates the strength of a relation between two scientists using a set of efficient link strength indicators. The importance of the relation is then validated according to the scientists' expected collaboration strategies. The prediction process is performed in a heterogeneous context where the types of the nodes and the links are considered. Experiments on the DBLP real-world scientific bibliographic network, show higher performance of our model in comparison with the link prediction baseline methods.

Article Preview

Introduction

Scientific bibliographic network is a complex network describing the social structure of science in a scientific discipline across the links that relate scientific entities. These links generally take the form of collaborations to produce new scientific knowledge. As science is a social institution where advances depend crucially on these forms of interactions (Katz & Martin, 1997), many studies have invested in identifying the diverse factors that control their dynamic and their evolution in the bibliographic network (Newman, 2001; Velden et al., 2010; Franceschet, 2011). Indeed, studying the evolution of a scientific network presupposes using effective techniques in order to understand how a link appears or disappears between the nodes. Link prediction (Liben-Nowell & Kleinberg, 2003) is one of the well-known techniques that attempt to answer this question. In the literature, it refers to predicting a probable association between two given nodes in a given time interval in the future relying on their behavior in a given time interval in the past. The relevance of this technique reveals in its widespread applications in various domains. For instance, in online social networks it is used to help actors to form new social relationships, in e-commerce it is used to build efficient recommender systems, in security it can help to find suspicious interactions between criminal groups, and in bioinformatics and biomedicine it can help to identify probable interactions between proteins or between diseases and genes. As a result, the extensive study of this problem has produced several models ranging from supervised and unsupervised methods to probabilistic methods, relational models, linear algebraic models…etc. For more information, the reader may find a detailed survey of these models in (Hasan & Zaki, 2011).