Adaptive Probabilistic Topic Models for Social Networks

Abstract:

Online social networks such as Twitter, LinkedIn, and Facebook generate tremendous amount of text and social interaction data. On one hand, the increasing amount of available information has motivated computational research in social network analysis to understand social structures. On the other hand, annotating, retrieving, and analyzing textual information generated within the social network is also crucial for many applications such as content ranking, recommendation systems, spam detection, and viral marketing. In this thesis we propose a composite probabilistic topic model for social networks which automatically learns topic (of interest) distributions for each entity in the social network using a combination of the available content (text) in social network and the structural properties of the network. The utility of our proposed modeling is to reduce the dimensionality of the data, exploit the underlying social structure and linkage property of the network while generating a more accurate topic model for the end-users of the social network. We discuss in detail the results on both the NIPS data set (papers from the Neural Information Processing Conference) and Enron Email (emails from large corporation) corpus. We present perplexity score for test documents as a basis of our experiments to evaluate the generalization performance of our model and provide evidence that relevant topics are discovered.