This is like saying that you want your driverless cars to work for Uber while you are sleeping. While it sounds possible, as currently configured neither ethnographic practices nor quantitative text analysis are up to the task.

This is puzzling. No one made this claim. If people believe that computers will do qualitative work by collecting data or developing hypotheses and research strategies, then they are mistaken. I never said that nor did I imply it. Instead, what I did suggest is that computer scientists are making progress on detecting meaning and content and are doing so in ways that would help research map out or measure text. And with any method, the researcher is responsible for providing definitions, defining the unit of analysis and so forth. Just as we don’t expect regression models to work “while you are sleeping,” we don’t expect automated topic models or other techniques to work without a great level of guidance from people. It’s just a tool, not a magic box.

Another comment was meant as a criticism, but actually supports my point. For example, J wrote:

This assumes that field notes are static and once written, go unchanged. But this is not the consensus among ethnographers, as I understand the field. Jonathan van Maanen, for example, says that field notes are meant to be written and re-written constantly, well into the writing stage. And so if this is the case, then an ethnographer can, implicitly or intentionallly, stack the deck (or, in this case, the data) in their favor during rewrites. What is “typical” can be manipulated, even under the guise of computational methods.

Exactly. If we suspect that field notes and memos are changing after each version, we can actually test that hypothesis. What words appear (or co-appear) in each version? Do word combinations with different sentiments or meanings change in each version? I think it would be extremely illuminating to see what each version of an ethnographer’s notes keeps or discards. Normally, this is impossible to observe and, when reported (which is rare), hard to measure. Now, we actually have some tools.

Will computational ethnography be easy or simple? No. But instead of pretending that qualitative research is buried in a sacred and impenetrable fog of meaning, we can actually apply the tools that are now becoming routine in other areas for studying masses of text. It’s a great frontier to be working in. More sociologists should look into it.

6 Responses

My skepticism about the future of “computational ethnography” is more about the likelihood of the raw input being available than the processing of that input. Ethnographers would need to record every second of every interaction, with context, archive these recordings (no editing!), and make them available as raw data. Field notes, too, and every iteration of them.

It’s hard for me to see this happening, not just because of the sheer volume of data or the “stickiness” of current practice in the field. There’s the Hawthorne effect. And, there are huge issues of confidentiality and of protecting human subjects in ethnographic research, much more so than with quantitative research. If IRBs aren’t going to allow an anthropologist to interview foreign terrorists who are slated for execution because the interview might cause them undue harm, they certainly aren’t going to allow ethnographers to record and release videos of, say, gang members in a housing project in NYC, or undocumented workers, or children in any context, or any of the 100s of other vulnerable populations that ethnographers are so good at studying. And they shouldn’t, either.

(By the way, why call it computational ethnography and not just text analysis? Is there something fundamentally different — other than volume of data — about the types of analysis that you’re proposing compared to what linguists have been working on for years? Or is it a strategic frame, given computational this-and-that is currently “hot” and there’s lots of money floating around for it and computer scientists and other high-status groups on campus are involved? Not that there’s anything wrong with that…)

I do appreciate the effort here, Fabio. And I’m flattered to get called out (seriously). But, as the saying goes, you’re moving the goal posts. The first problem computational ethnography was going to solve was: “…someone would ask, “is this a typical observation?” and then the ethnographer would say, “yes, trust me.” We no longer have to do that.” And now, the problem computational ethnography is going to solve is: “If we suspect that field notes and memos are changing after each version, we can actually test that hypothesis.”

Before computational ethnography becomes some sort of panacea for both detecting generalizability *and* analyzing the craft of fieldnote re-writes, social scientists would probably need to come up with at least discipline-wide standards for writing ethnographic fieldnotes. We most definitely do not have that right now. Compare to text analysis of historical materials, like the link you provided to Laura Nelson’s work in your original post. Laura probably has an understanding of what made it into those documents, and what didn’t–basically, what they are theoretically intended to represent. For ethnographers like Van Maanen, fieldnotes are placeholders for more details and narrative flow to be imposed by the ethnographer at a later date…sort of like signposts to retrieve memories. That type of note writing will be inherently incomplete at any given time and constantly in flux, not due to deception by the ethnographer per se, but because that’s what makes fieldnotes different than, for example, an archive of congressional testimony.

I’m not saying I advocate this approach. I bring it up just to illustrate the lack of standards across the field for how we should write our fieldnotes, which I think affects which problems computational methods will be able to solve.

The idea that ethnographic field notes would serve as good fodder for corpus-analysis methods seems very unlikely to me, since the unstandardized and open-ended character of field notes seems part of the point. Far more plausible is that ethnographers of groups that produce a substantial corpus of digital content as part of their activities could gain access to and use that corpus to supplement the ethnography. The obvious example would be ethnographies of online communities, but one could also imagine having informants who are willing to allow researchers to swallow up e-mail archives or Facebook timelines.

Perhaps our disagreement is over whether the analysis you described in the original post was something folks could do today versus something that might be available in the next three to ten years.

You wrote, “We no longer have to do that.” which I took to mean that we already have the tools to analyze whether or not a specific interaction from field notes is representative. To my knowledge, most topic modeling algorithms, like LDA, require thousands of texts each hundreds of words long to achieve any sort of reliable estimates. Those conditions won’t be met in most field notes. Additionally in practice, most topics extracted don’t match onto sociological interesting things, like frames or emotions. I suspect that the topic modeling of field notes would mostly recreate the events (both because of the way the texts were produced and because of the nature of topic modeling algorithms) rather than common themes across events, but this is certainly testable. The original post treats LDA as some sort of readily available robustness check for qualitative findings, but we are long way off from that.

That said, as I mentioned in my original comment, there’s plenty of techniques out there that might be of use to folks who collect/produce their own text data.

What is the impetus here? Somehow by rationalizing the processes for producing qualitative work, the work will be better; and externally, the field will benefit from a greater degree of legitimacy. Is that fair?

I have no idea why the introduction of computational ethnography would actually accomplish either of these things. The work would be different, yes. But better? Why? Why keep punching at a set of technologies which carry high, but difficult to describe and defend processes? We already know that this is the case. And does anyone even doubt that the most likely result of ‘computational ethnography’ to further delegitimate qualitative ethnographic work?

The target should be the uncritical over-valuation of ‘rationalized’ social science. More Freese, less Rojas.

And do we think that there is a time, now or in the future, when sociology will achieve anything even remotely resembling external scientific validation? This is not to say whether or not it *is*, but where is this future when sociologists get respect, love, and legitimacy?

As I read this, the suggestion is trying to solve a third-order problem with a second-rate solution.