Natural Language Processing for Building and Enhancing Graph Data and Theory

How can we use user-generated content to construct, infer or refine network data?

We have been approaching this question in two ways: First, many of the main theories from the field of social network analysis, such as Structural Balance, Triadic Closure, Transitivity, and Strong ties/ weak ties, have a wide array of practical applications today, e.g., for recommender systems, community detection and the assessment of network resilience or vulnerability. However, as these theories were often developed based on in-depth observations of small groups over lengthy periods of time, it remains largely unclear how these theories apply in today’s contexts and to large-scale, socio-technical networks. We have been tackling this problem by leveraging communication content produced and disseminated in social networks to enhance graph data. For example, we have used domain-adjusted sentiment analysis to label graphs with valence values in order to enable triadic balance assessment (Diesner & Evans, 2015). The resulting method enables fast and systematic sign detection, eliminates the need for surveys or manual link labeling, and reduces issues with leveraging user-generated (meta)-data.

Second, we have examined different ways of constructing network data based on the content of communication, e.g., different conceptualizations of the similarity of language use and shared information, and comparing the resulting structures to those obtained by linking people through other types of behavior, e.g., who is talking to whom. We found that simple approaches on the lexical level outperformed more complex, topic based approaches, with the latter revealing additional social structures (Diesner, Aleyasen, Mishra, Schecter, & Contractor, 2014). We have been making several of our methodological advances available in ConText (http://context.lis.illinois.edu/), an open-source software tool that supports common and cutting-edge techniques for separately and jointly analyzing the structure and content of communication and other forms of social interactions.