building computational sociology: from the academic side

Hire computational sociologists. Except for one or two cases, computational sociologists have had a very tough time finding jobs in soc programs, especially the PhD programs. That has to change, or else this will be quickly absorbed by CS/informatics. We should have an army of junior level computational faculty but instead the center of gravity is around senior faculty.

Offer courses: This is a bit easier to do, but sociology lags behind. Every single sociology program at a serious research university, especially those with enginerring programs should offer undergrad and grad courses.

Certificates and minors: Aside from paperwork, this is easy. Hand out credentials for a bundle of soc and CS courses.

Hang out: I have learned so much from hanging out with the CS people. It’s amazing.

Industry: This deserves its own post, but we need to develop a model for interacting with industry. Right now, sociology’s model is: ignore it if we can, lose good people to industry, and repeat. I’ll offer my own ideas next week about how sociology can fruitfully interact with the for profit sector.

computational sociology still needs to be good sociology – too often I hear people talk about the cool things they did with a large dataset and get offended when ask about the novelty of the finding, or even basics such as sample selection and its implications for the finidings… As for purely focusing on computation, CS scholars are better at data mining anyway, and sometimes study similar datasets (e.g. the networks folks at Stanford CS & EE) so perhaps we should hire more CS people into soc departments

The points and comments are spot on. CS scholars, and even more so industry, are today often at the forefront because of their skillsets and access to data. Of course they can suffer from lack of social science background as you would expect from the emerging intersection of disciplines. This doesn’t seem like much of a surprise, and much of the work has good merit even lacking the polish that would come from deep sociology background. An interesting question that might guide CSS studies, especially multidisciplinary collaborations, is to ask how the broad but shallow data analysis would likely differ or contrast with narrow but deep traditional approaches. I suspect large N analysis will often spotlight theoretical confirmations, refutations and nuances, and consequently large N theoretical innovation can be sparked by the data analysis and then confirmed by traditional analysis. To me, it seems that the arc of this work will lead to the practice of empirical social science coming to resemble the natural sciences a bit more in some ways.

“I’ll offer my own ideas next week about how sociology can fruitfully interact with the for profit sector.”

Are you implying there’s something wrong with our current approach of engaging in moral entrepreneurship against the research they make public and lamenting that, sure, they can fully automate translation of Slovak to Portuguese, but they lack a theory of language?

Just bookmarking because I think this is an important discussion. Leaving this whole field to industry is wrong not because industry people aren’t competent (they are, of course) but their questions, motivations and incentives are quite different than what academics can (or should) bring to the table both methodologically and substantively.

Like Zeynep, I’m also bookmarking. And I’m mulling lots of questions. I have spent a good bit of time now at Microsoft Research, Berkman and a firm that has to remain anonymous. All of these places had a lot of data scientists. My experiences suggest that we should ask 1) if data scientists are interested in sociology and 2) what they mean by the “sociology” that they are interested in. Take for instance inequality (or stratification). I think of these as core sociological interests, along with the study of groups and society as units of analysis. My read of much of the comp. soc. in my area of class and education research finds little attention to groups, any articulation of society, or much interest in inequality beyond the rudimentary functionalist justification for doing a study (e.g., “there’s been a lot of talk about unequal returns to education”). Some of this is a methodological challenge. When individuals are operationalized as tasks it is difficult to reconstitute them as persons embedded in groups or organizations. And data science mostly measures tasks, albeit in aggregates that seem to dazzle we who deal with smaller Ns. So, how do we reconcile the tools of comp. soc. with the core theoretical principles of sociology? I’m open to it. I think it’s an imperative actually as we consider the digital plane of social reality. I think it is especially crucial for marginalized groups. But if we need more partnerships, we also need a lot more heavy lifting on what those partnerships will mean exactly.

Sometimes the definition of knowledge or disciplines themselves change as a result of methodological advances. Data scientists may or may not seek to answer the exact same questions as traditional social scientists would. What it is to be a sociologist, and what are the typical research questions may shift as sociology grows more comfortable with digital tools. Also, I think of traditional methods and theories (or their derivatives) are interpretive tools that are required to operationalize the computational work. Computational findings can only be as good as they can be theoretically contextualized and verified the “old fashioned” way.

I think there are two absolutely critical contributions to data science that sociology can make beyond the research question issue that Zeynep and Tressie point to (and with which I wholeheartedly agree).

The first is a substantive understanding of research methods. I see many data mining projects in higher education administration that remain in near-total ignorance of basic questions about data: about its reliability, its validity, its provenance, etc. Social scientists are used to asking these kinds of questions at the outset; information technologists (especially the vendors with whom I deal) generally take data at face value. That is, to my mind, a recipe for data-driven disaster.

The second is an understanding of the social practices, structures, and contexts within which data models operate. A neural net exists because some person or group in some organization asked for it to be created so that they can use the information it provides to accomplish some task. Data scientists who see the field as strictly a technology field are likely to build models that are inconsistent with these contexts in ways that those demanding them may not be able to see themselves. Data scientists do better data science when they understand data science as a form of social action.

Both of these are, I would suggest, excellent topics for an information and society course, which should be part of any data science education.