computational ethnography

An important frontier in sociology is computational ethnography – the application of textual analysis, topic modelling, and related techniques to the data generated through ethnographic observation (e.g., field notes and interview transcripts). I got this idea when I saw a really great post-doc present a paper at ASA where historical materials were analyzed using topic modelling techniques, such as LDA.

Let me motivate this with a simple example. Let’s say I am a school ethnographer and I make a claim about how pupils perceive teachers. Typically, the ethnographer would offer an example from his or her field notes that illustrates the perceptions of the teacher. Then, someone would ask, “is this a typical observation?” and then the ethnographer would say, “yes, trust me.”

We no longer have to do that. Since ethnographers produce text, one can use topic models to map out themes or words that tend to appear in field notes and interview transcripts. Then, all block quotes from fields notes and transcripts can be compared to the entire corpus produced during field work. Not only would it attest to the commonality of a topic, but also how it is embedded in a larger network of discourse and meaning.

12 Responses

This assumes that filed notes are static and once written, go unchanged. But this is not the consensus among ethnographers, as I understand the field. Jonathan van Maanen,for example, says that field notes are meant to be written and re-written constantly, well into the writing stage. And so if this is the case, then an ethnographer can, implicitly or intentionallly, stack the deck (or, in this case, the data) in their favor during rewrites. What is “typical” can be manipulated, even under the guise of computational methods.

The formalization of ethnography should be controversial because ethnography was developed explicitly to contrast with formalized methodologies which imposed mathematical models. I can see using statistical information from other studies heuristically to corroborate but an ethnography addresses methodology in the social sciences through an individualistic or atomistic lens. The effect of formalization by mathematization or statistical analysis is to remove interpretation or subjectivity from the analysis which is what ethnography intended to locate and describe.

Designers write websites with an eye towards being ranked highly by Google. Journalists write articles (and headlines) to be chosen by Facebook. Job applicants use key words to be flagged by HR algorithms. Why would ethnographers be different, if their notes will be processed by a computer? I would not say that “the future is here” but rather “the reversal of roles” is here, in which the code creates the finding.

Applying this search for generalizability to ethnographic fieldnotes runs counter to the epistemological orientation of qualitative research in at least two ways. First, it assumes that we want our ethnography to be generalizable. But if one were looking for generalizable results, then one shouldn’t have really done an ethnographic study to begin with. One should have done a survey (see Small’s article, “How Many Cases do I Need?”, from Ethnography 2009). The point of an ethnography isn’t to produce generalizable results, it’s to identify contextualized processes and mechanisms and then theorize the relationship between the observations and their contextual backgrounds. Second, it presumes that connections between words and themes will be identifiable across discrete sets of fieldnotes and transcripts, or that what constitutes a theme (as far as a computer can identify it, anyway) will look the same no matter what setting it came from. But any ethnography that incorporates observations from more than one site, setting, or group of research participants will not lend itself to this automated search for themes, because the way people talk about something varies with the setting. Automation and computation, it seems to me, must necessarily radically decontextualize the data, when ethnography is in fact all about context.

The term generalizability is too strong. Research should provide valid explanations of phenomena even if the phenomena and the ethnographer’s description does not generalize. The issue is whether the investigation can determine the rationality or capacity of the people under study to think and reason and if so their understanding of their situation. We cannot begin with universal cultural concepts, we must first determine the concepts in use and then see if they are logically and scientifically valid and then perhaps how they are connected to other groups and contexts.

This is like saying that you want your driverless cars to work for Uber while you are sleeping. While it sounds possible, as currently configured neither ethnographic practices nor quantitative text analysis are up to the task.

For starters, the example given would require 1) a large amount of hand-editing of the interviews/notes to get them into a machine-readable format; 2) a working definition of what is a single text – quote, anecdote, interview – to be analyzed with the data appropriately labeled; 3) texts to be interpretable stripped of context, such as the interview question; 4) a large enough n to construct meaningful topics (think thousands,not dozens); and 5) the unlikely event that the topics matched up to the analysis in a meaningful way.

More practically, one could feed a string of question answers through a dictionary-based coder, like LIWC, and count how many of the texts contained significant expressions of anger or sadness, but again there are a bunch of theoretical decisions implicit in even this, and the requirement that the texts be easily processed in CSV like format.

I suspect that topic models or something like that might be helpful for quick and dirty exploratory analysis, but I think we are a long ways off, both methodologically and theoretically, from using them to analyze which field notes are representative of specific interactions.

While I agree that there is a danger of “stacking the deck” in favor of some result, this is really not that different to the danger of “stacking the deck” when you’re doing an statistical study, is it? I mean, there are choices to be made regarding coding there too, not to mention choices regarding modelling. Granted, ethnographers will have more room to make choices that may skew their results than most quantitative researchers would have, as the latter will generally be probed extensively by their peers/reviewers about the robustness of their results. But I think in both cases, it all boils down to whether or not we can trust the ethics of researchers and the scrutiny of field in which they operate.

Just to compensate for these conciliatory words: research that denies any kind of generalization, i.e. that is fully case-specific and has no bearing whatsoever on contributing to theory that has any application beyond the specific case, is in my humble opinion not science but glorified journalism. That may sound harsh, but then again my experience is that very, very few people really are committed to denying any kind of generalizability of their work, even though they may balk at the word.

Yeah, I totally agree with Alex’s comments. What I was going for was a response to Fabio’s vision of “the future”: “…someone would ask, ‘is this a typical observation?’ and then the ethnographer would say, ‘yes, trust me.’ We no longer have to do that.”

All of the comments here seem to agree that, yes, we still have to trust the ethnographer enormously (just as we have to, at some level, trust quantitative research on the researcher’s coding, etc). Computational methods do not change that.

Levi-Strauss famously wondered, he said, “why did he just tell me that?” after talking to an informant from a wholly different culture. So, ethnographic studies raise the problem of interpretation at the discursive and psychological level. It is hard to say that this is ever replicated in economics departments?

Any study of aggregated data involves abstraction and therefore or by implication a loss of content. Thus, it is necessary to spell out the effects of any mathematical model or data analysis at the concrete or case level. One obvious problem is that mathematical models harvest data and abstract from it, but they fail to reverse and make sense of their findings by identifying the effects of their results!

[…] this week, I suggested a lot is to be gained by using computational techniques to measure and analyze qualitative materials, such as ethnographic field no…. The intuition is simple. Qualitative research uses, or produces, a lot of text. Normally, we have […]