The recent New York Timesseries on DH picked up a thread that has been fascinating me for a while:

A history of the humanities in the 20th century could be chronicled in “isms” — formalism, Freudianism, structuralism, postcolonialism — grand intellectual cathedrals from which assorted interpretations of literature, politics and culture spread.

The next big idea in language, history and the arts? Data.

Members of a new generation of digitally savvy humanists argue it is time to stop looking for inspiration in the next political or philosophical “ism” and start exploring how technology is changing our understanding of the liberal arts. This latest frontier is about method, they say, using powerful technologies and vast stores of digitized materials that previous humanities scholars did not have.

Many folks reading this will recognize here a restatement of Tom Scheindfeldt’s “Sunset for Ideology, Sunrise for Methodology” post (which I find myself constantly referencing, even if I can’t bring myself to agree with it). I’m interested in returning to this question, both in theory and in practice (as a Marxist might say), or, to adopt the argot of THATCamp, both in yacking and hacking.

First, some “practice”: we can find a particularly remarkable instance of this sort of “methodological” work in a project also profiled by the Times: Dan Cohen and Fred Gibbs’s fascinating Victorian Books project (here is Dan Cohen’s own extended write up). On a much smaller scale, with much less expertise and far less success I have played with similar techniques myself. And just yesterday Aditi Muralidharan posted about her project WordSeer, which leverages natural language processing to open richer avenues of text analysis.

Now, some yacking: Despite a lot of well-meaning “there is no practice without a theory, and no theory not put into practice” talk, this division seems pretty well entrenched (Matthew Jockers—whose work mining novels at the Stanford Literature Lab is another great example of this work—nicely tries to bring distant reading and close reading together in this recent comment). In part this is because of the very different skills (e.g. statistics!) required to make sense of (and make claims about) this new type of data. (Random Session Idea: “‘So, you never took a STATS class’, or ‘How Many is Enough?’: Statistics for Readers of Books”). It is also, I think, difficult to integrate this sort of data into the traditional concerns of humanities scholars. To use my perennial example: what can “distant reading” tell me about the history of sexuality (my metonymy for “things folks, say dissertating grad students, are interested in right now”)?

So I’m interested in putting our yack where our hack is: in trying to imagine how text analysis can contribute to the things scholars, right now, actually care about; and let’s put our hack where our yack is and play with some text and the NLTK or Voyeur or whatever. Let’s try to do something interesting.