In a tour de force on the opportunities and challenges of big data Butterworth apparently demolishes the idea of small sample data analysis or (more questionable?) the use of anecdotes and thoughtfulness to argue points of contrversy. But finding correlations in massive amounts of data doesn’t mean that the difficulty of finding causality — what’s really going on — has disappeared. It doesn’t mean we abandon anecdote and argument and thoughtful explanation. It means only that we can calculate correlations on bigger data sets. Sometimes — Angrist-and-Pischke style — we can do something akin to experiment. Still, such efforts require much more than mere counting, more than mere enumeration.

His first example, gender bias in the media:

Pre-Big Crit, you might have had pundits setting the air on fire with a mixture of anecdote and data; or a thoughtful article in The Atlantic or The Economist or Slate, reflecting a mixture of anecdote, academic observation and maybe a survey or two; or, if you were lucky, a content analysis of the media which looked for gender bias in several hundred or even several thousand news stories, and took a lot of time, effort, and money to undertake, and which—providing its methodology is good and its sample representative—might be able to give us a best possible answer within the bounds of human effort and timeliness.

The Bristol-Cardiff team, on the other hand, looked at 2,490,429 stories from 498 English language publications over 10 months in 2010. Not literally looked at—that would have taken them, cumulatively, 9.47 years, assuming they could have read and calculated the gender ratios for each story in just two minutes; instead, after ten months assembling the database, answering this question took about two hours. And yes, the media is testosterone fueled, with men dominating as subjects and sources in practically every topic analyzed from sports to science, politics to even reports about the weather. The closest women get to an equal narrative footing with men is—surprise—fashion. Closest. The malestream media couldn’t even grant women tokenistic majority status in fashion reporting. If HBO were to do a sitcom about the voices of this generation that reflected just who had the power to speak, it would, after aggregation, be called “Boys.”

How is this useful analysis, that news stories are more likely to be about men than about women? And how is this evidence of gender bias in news stories? There is only gender bias here if the actual news had been unfairly represented by the stories — if somehow women made as much news as men. But yet we know that women don’t for a myriad of reasons. Women are busy with family. Women don’t face the same opportunity structures as men. Women face bias in the professional and political worlds. The presence of a lopsided gender ratio in magazine and news stories does not necessarily point to gender bias in journalism.

It’s just not that easy to tease truth out of numbers. I hate to restate a cliche everyone should already know and which is too often stated uncritically, but I will anyway. Correlation is not causation.

Patterns are not truth.

Big data does not, in fact, allow us to answer really big questions because most really big questions are questions about causality: Do women face unfair bias — is their unequal representation the result of bias apart from real world factors that would otherwise tend to reduce their representation (and in what context, what country, what career?) Or from my current job context — does social engagement improve academic outcomes (in what context, in what country, in what courses, in what classroom)? Big data is not so useful in answering such questions. Mere correlation in a specific context doesn’t tell you much. Broad-scale big data correlation, even less.

Here’s an example of a study that demonstrated the clear presence of gender bias. Merely changing the gender of a name from male to female on a resume led to lower rankings on hireability, competency, and mentoring. No big data required. This is essentially experimental — everything was held constant except the gender of the applicants’ name. Big data doesn’t make experiments that control for outside factors more likely. It may reduce their use if it makes us think that big data has something more to tell us than small-data experimentation.

To elaborate on that example from my current job: since students who post to discussion threads show better grades (let’s say they are more “engaged”), will increasing discussion thread posts improve grades? Maybe – but only in very limited contexts, where discussion thread participation actually improves students’ ability to make sense of content, produce better work, spend more time in class, and ultimately do better in class. In most cases, there is only correlation, not causation. Better students post more. They are more conscientious. They are already more engaged. You can make more discussion threads mandatory but I’m skeptical that will improve outcomes.

We can count and then we can correlate counts but to make sense of those associations requires more work. It requires understanding, explanation, context, sometimes anecdote — and ideally experimentation too.

4 responses to “Disciples of enumeration, beware”

I think largely what we are seeing is the rise of a large group of incompletely trained armchair researchers/analysts. In the past, large data collection efforts were prohibitively expensive. As a result, they were primarily the domain of academicians and private research firms dedicated to the purpose. Now, such data extraction can be done from a laptop, which has encouraged a wide range of people to take a crack at data analysis. I think that’s great, in general. But the downside is that many of the people collecting, analyzing, and commenting on these massive amounts of data aren’t fully trained in the very issues you are describing. While you and I might nod our heads and go “of course” to “correlation is not causation”, many have heard the phrase but have very little practical understanding of its implications. And that can be quite dangerous if those voices are perceived as credible. Frankly, this all reminds me of the parallel problem faced in education regarding MOOCs – people assume because a particular class with particular learning objectives can be automated and made massive that all instruction can be automated with equal quality (see discussions of “the imminent collapse of academia”). Well, surprise surprise – things are a lot more complicated than they seem on the surface to an untrained observer.

This is where I feel the need to point out that the “science” in data science actually means something. There is a science to research design and to statistical analysis and it’s not something you learn in one intro to stats class.

But the example Butterworth cited about gender bias in the news was from an academic group I think. I haven’t read it but would like to see what they say about confounding factors and what exactly they claim to have demonstrated. I bet there are some caveats and more nuanced suggestions than what I’m taking from Butterworth’s description of it.

Thanks for your comments everyone. The really key thing about the piece is less whether the Bristol study identified robust gender bias in the media but that it devised a valid alogrithm to test a vast amount of media along pre-etablished “hand coding” guidelines for content analysis. Interpretation is critical to the success of these studies. Unfortunately, even with narrative elements to give the journalistic “why this matters to the media,” it was still not picked up by any of the media news/criticism aggregators, presumably because it was too technical.

Trevor, that is a good point though not one that would make an impression on me since I don’t pay much attention to science/research journalism. I’m coming at it from the point of view of a statistician who keeps having to remind people that correlations only tell us so much. Big data correlations aren’t any more useful than small-data ones and perhaps even less so because they skate over so much context.

I am heartily in favor of replacing human coding for content analysis with automated algorithms — engaged in some human coding last year and it was quite painful (though showed interesting results).

Like I said – great article. I savored it. First thing I read in many months that motivated me to say something, not so much because I questioned the gender bias study but because I felt so engaged by your writing and ideas. Thank you.