Toward deciphering the social media genome: big data and computational social science may explain social embriology

Nothing seems more random to the naked eye than the babble of online exchanges. Are the posting patterns on Wikipedia user pages or the answers offered to Q&A sites ruled by any discernible configurations? Do users engage in predictable posting/answering behaviors? More pointedly, do they perform stable roles when they go online? Welser, Gleave, Smith and other several researchers believe that the answer is yes, which lead to an interesting series of papers, a book and a new piece of software, NODEXL. Our own edited volumes, Roles, Trust, and Reputation and Transparency in Social Media explore many of these issues.

Our latest two Purdue University projects, Visible Symbiosis and KredibleNet offer a testing ground for such hypotheses, by making available for research purposes a data service that allows mining the entire corpus of social interactions captured by Wikipedia (post a comment below if you need details). These, and other similar projects, promise the emergence of a new social and theoretical paradigm whose goal is to decipher the web of social interactions generated by social media. This wave of social research has the potential to generate for sociology the same type of momentum decoding the human genome created for biological sciences. Furthermore, the new paradigm aligns itself well with the emerging interest in “computational social science” recently discussed in Science and promoted as a new funding priority by the National Science Foundation.

In a series of groundbreaking research papers Welser, Gleave, Smith, and Fisher have proposed new methodologies and theoretical perspectives for understanding relatively stable role repertoires have emerged online. Welser and his colleagues have revived for this project the older tradition of role and status analysis (pace Nadel) inspired by the symbolic interactionist school and by functionalists like Talcott Parsons. According to this paradigm, roles are both intrinsic cultural characteristics and sets of behaviors. Social media allows capturing both, but especially the later. Social media behaviors are embedded like prehistoric bugs in the amber of news feeds, Wikipedia contributions, or answers to Q&A sites and they can be holistically described via social network analytic (SNA) procedures that connect user behaviors through social graphs.

“Social roles in online discussion forums can be described by patterned characteristics of communication between network members which we conceive of as ‘structural signatures.’ This paper uses visualization methods to reveal these structural signatures and regression analysis to confirm the relationship between these signatures and their associated roles in Usenet newsgroups. Our analysis focuses on distinguishing the signatures of one role from others, the role of “answer people.” Answer people are individuals whose dominant behavior is to respond to questions posed by other users. We found that answer people predominantly contribute one or a few messages to discussions initiated by others, are disproportionately tied to relative isolates, have few intense ties and have few triangles in their local networks.”

“standardizes the usage of the term ‘social role’ in online community as a combination of social psychological, social +structural, and behavioral attributes. Beyond the conceptual definition, we describe measurement and analysis strategies for identifying social roles in online community. We demonstrate this process in two domains, Usenet and Wikipedia, identifying key social roles in each domain. We conclude with directions for future research, with a particular focus on the analysis of communities as role ecologies.”

This second paper is very well articulated theoretically but its empirical examples need to be extended and fully validated. At this point, the examples offered in the paper are only fertile pointers to what we could do. For the Wikipedia example only two roles are discussed (technical and substantive editors). The relationships that embed them are revealed by performing an a posteriori survey of posting behaviors, which is then turned into a social graph. That is, the two roles are derived from existing qualitative research on Wikipedia roles and then network types are articulated from analyzing userpage-to-userpage hyperlinkages for these two types of roles.

Welser, Smith and Gleave have opened a very productive line of research, which can be extended in at least one way. For example, the soon to be released Purdue / Teragrid Visible Symbiosis database of Wikipedia editorial interactions could be used to mine the Wikipedia editorial history for structural patterns of interactions between editors. Potential social relationships can be derived from mapping co-editorial linkages (author/edit-document-author/edit). Roles can be uncovered dynamically and probably exhaustively. In other words, we can engage in a semi-inductive process of profiling Wikipedia editorial roles and of uncovering the collaborative networks in which they are embedded. The process would start with outlining the global networks of coeditorial interactions, which would continue with identifying subnetworks associated with specific roles. The networks would be derived from theory driven taxonomy/classification algorithms. These algorithms would start by proposing initial models/role sets and accompanying networks, which would then be fit against the data using methodologies similar to those developed for Pstar procedures. Goodness of fit measures would indicate how good the initial models are. New models could then be proposed, improvement in Log-likelihood or similar measurements ascertained, and so on, until best-fit models would be obtained.

In brief, revealing social roles and their ecology online can go beyond exemplary cases. It is now possible to engage in a process of social and data mining similar to that of deciphering the human genome. This is the gist of the Visible Symbiosis project, which we will soon initiate at Purdue University. The goal of the project is to distinguish the boundaries and configurations of the specific “genes” (subnetworks defining roles) identifiable in Wikipedias editorial contributions and interactions. The genes (network signatures for specific roles, as Welser calls them) would then be classified into sets and generic typologies. This part of the analysis will follow a theoretical perspective, one that goes beyond description. It could be inspired by an updated version of dynamic functionalism, non-equilibrium social system and social entropy theory. The ultimate goal is to find out if and how social roles and networks create specific types of quasi-institutional arrangements that act as emerging adhocracies and how this influences knowledge production. Our “social genomics” exercise would only be complete when the “social genes” (ie, social networks associated with specific roles) would be mapped onto specific social processes and institutions. This might then lead to a new theory of human aggregation and evolution online. And would be an excellent example of computational social science in action.

The ultimate goal of this approach is to untangle the mystery of how social groups are born, form, and grow. It is an attempt to uncover the phases of growth in the life of naturally occurring online groups, which can further one teach us something about the human ability to create organizations. It is also an attempt to treat social groups, again, as more than the sum of their parts, as social organisms.

Would this mean that the mighty fallen angles of Talcott Parsons‘ generation will be brought back to life? Will functionalism be resurrected? Or will the idea of homeostasis be also revived? I think that the new social theory that will emerge from deciphering the social media genome will be productive only in so far as it will be freed from the fetters of abstraction. “Theory driven” sould not be equated with a priori ideas about social institutions and a purported “need” for cohesiveness or equlibrium. Since we know so much more about non-equilibrium systems, social entropy and dynamic systems the new theoretic paradigm can probably propose a new social science of adhocracies and ambiguity. Think Rheingold’s Smart Mobs meets Reismann’s Lonely Crowd with a little bit of Critical Mass mixed in for good measure.

One thought on “Toward deciphering the social media genome: big data and computational social science may explain social embriology”

Hmm….. “I think that the new social theory that will emerge from deciphering the social media genome will be productive only in so far as it will be freed from the fetters of abstraction. “Theory driven” sould not be equated with a priori ideas about social institutions and a purported “need” for cohesiveness or equlibrium. Since we know so much more about non-equilibrium systems, social entropy and dynamic systems the new theoretic paradigm can probably propose a new social science of adhocracies and ambiguity. Think Rheingold’s Smart Mobs meets Reismann’s Lonely Crowd with a little bit of Critical Mass mixed in for good measure…. ”

…. When social cognitive meets social mass, how would a social wave at the Purdue football stadium be used…?