DCW Volume 1 Issue 3 – Distant and Familiar

by Korey Jackson on June 8, 2012

This week’s DCW explores the spaces between scholars and their subject matter (think ‘distant reading’–or perhaps ‘macro reading’ is more accurate), between entrants and experts, and between the skills we espouse and the skills we actually need. Between, in other words, distance and familiarity.

Digital Skill for Humanities Graduate StudentsAmy E. Earhart

I read Roger Whitson’s “Opening Education” blog post with great interest. I’m thrilled to see an uptick in digital pedagogy discussion and have a few ideas to add to the conversation.

I’ve just concluded my first graduate course in digital humanities and have become convinced that the boundaries between teaching, service and research are surprisingly fluid in digital scholarship. Our class read a number of essays from Debates in Digital Humanities including Alex Reid’s “Graduate Education and the Ethics of Digital Humanities.” The students were very interested in thinking about ethics and graduate education, in large part, I suspect, because they were coming to terms with the realities of the scholarlyjobmarket in the face of increasing studentdebtload. In some ways, their discussions represented our profession-wide crisis of faith in the sustainability of graduate education. Even those students who had little interest in defining their area of study as digital humanities were interested in understanding how technology impacted their work. Accordingly, they drafted a list of digital skills that they believe that all humanities students should learn in school.

Training to support assessment of digital projects / assignments graduate students may see as instructors / professors. If we expect graduate students to either teach or grade digital components in classes, include a similar digital component in their training courses and first-year review.

Information about online scholarly publications/projects (new media production and the possibility/danger of publishing online).

Instruction on how to make online materials accessible to those with disabilities.

Promote and provide instruction for digital tools that aid students in research and writing including, but not limited too: Zotero, Scrivener, Endnote, BibText, etc. Tools and techniques for personal digital archiving: how to backup your files, where to backup your files, retaining and naming versions.

I am getting ready to help lead the NINES NEH Summer Institute: Evaluating Digital Scholarship to be held June 19-22 at the University of Virginia. Department heads are invited to the workshop and part of our time will be spent discussing graduate education in relation to digital scholarship. I hope that my students’ initial set of skills will be discussed and expanded during the workshop, and I encourage you to suggest skills or topics that you think are important to future graduate students. Look for a report on this blog after the institute.

GitHub FeverMark Sample

Github Fever seems to have struck the digital humanities. If you haven’t heard mention of GitHub, you likely will soon. It’s a source code management website that makes revision control and other tools for tracking changes and contributions to code repositories easily available to everyone. GitHub is free when it comes to sharing open source code (which any other user can “fork” to create his or her own branch of the code), while developers who wish to collaborate privately on proprietary projects must pay to play.

It’s a good question. The answers so far to Patrick’s question suggest that most humanists—digital or otherwise—are not ready to use GitHub. The learning curve is simply too steep, with no clear payoffs for non-code projects. There are simpler solutions for collaboration and tracking changes. Google Docs comes to mind, for instance.

Another reason for digital humanists to eschew GitHub for non-code projects—although I haven’t seen it articulated as such—may be that GitHub is a third-party service, with a fate beyond our control. History suggests that relying too much on a commercial service with interests that do not necessarily align with our own is no way to sustain the work of the humanities. The cloud is impermanent and owes us nothing. The cloud is not free, even when we pay nothing for it. The cloud is no place to build your digital castles.

And yet we keep hearing about GitHub. I’d like to suggest that the seeming ubiquity of GitHub in DH conversations has less to do with GitHub itself than what GitHub stands for. A culture of sharing, a generosity of spirit, and the bricoleur’s impulse to turn existing bits and pieces into something new. GitHub is—and will continue to be—symbolically central to the digital humanities, but in practice it will remain on the periphery, a fever and for most humanists nothing more.

THATCamp LACKathryn Tomasek

This time last week, I was gearing up for THATCamp LAC at St. Edward’s University in Austin, Texas. This was the second year for this particular iteration of THATCamp; the first was organized last year by Ryan Cordell and held at his former institution, St. Norbert College in DePere, Wisconsin.

Several repeaters attended this year. Ryan Hoover, the lead organizer of this year’s team, attended last year, as did Jacque Wernimot from Scripps College and Rebecca Frost Davis from NITLE. I could well be leaving people out, so if you were at both, I hope you’ll say so in the comments.

Having attended THATCamp Texas when it was held at the University of Texas, Arlington, this year’s main organizer Ryan Hoover noted that THATCamp LAC tends to draw more people who are new to Digital Humanities than do other THATCamps. This is great news for the kind of Big Tent Digital Humanities that organizers Matt Jockers and Glen Worthey promoted for last year’s Digital Humanities conference at Stanford.

Ryan noted that people new to the unconference did a great job of moving around among the sessions. And several “newbies” said they felt welcomed and enthusiastic about taking what they had learned back to their home institutions. The geographical reach of those institutions was particularly broad, with one participant coming all the way from Alaska.

A group who identified themselves as #feralcats established their own session in true THATCamp form and wrote up some guidelines for how to adhere to the “less yack, more hack” credo. Ryan suggested that perhaps next time it would be a good idea to print up t-shirts for THATCamp Ambassadors who could circulate and promote more hacking.

Jesse Stommel has been thinking a lot about audience. He noted that the two different constituencies, experienced DHers and people interested in and curious about DH, are both key to the culture we create at THATCamp. He thinks it is important to address the organization of the unconference to both audiences, not to separate them, but to think about ways to bring them together productively. He also plans to include students as he prepares for hosting THATCamp HybridPedgagogy in the fall.

Jacque Wernimot, who is organizing the upcoming THATCamp Feminisms, offered to talk with me about her impressions, but I didn’t have time to follow up. So Jacque, if you’re reading this, I hope you’ll say something in the comments.

We left without identifying a sponsoring institution and organizers for next year. Anybody want to volunteer?

Adventures in 19th Century DataRoger Whitson

Two articles were making the rounds this week using distant reading techniques to analyze large datasets of 19th century literature. The articles both directly addressed the paradox at the heart of quantitative analyses of culture: how do you quantify aspects of literary and cultural study, especially since literary study has traditionally defined itself against quantification?

The first, “A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method,” was written by Ryan Heuser and Long Le-Khac and is the culmination of a two-year study. Heuser and Le-Khac’s major contribution in the paper is the definition of a “semantic field” as the major point of analysis in distant reading. They define semantic field as “a group of words that share a specific semantic property” and wanted to use distant reading to “see what kind of literary history could be done with semantic fields” (4). The process they used to determine these semantic fields is interesting in itself. For example, they found that many of the critical keywords that are often found together in scholarship really did not appear together in 19th century novels. Further, they had to figure out whether the larger macro-trends across decades actually reflected trends within the specific novels occupying their corpora. Ultimately they find that it is easy to have confirmation bias when interpreting data about literary novels: that is, to see traditional concepts and trends happening in data whether the trends actually reflect those concepts or not. On the other hand, they suggest approaching distant reading by “trangulat[ing] multiple dimensions of data and account for a wide set of related observations,” which they call hypothesis testing: “eliminat[ing] potential theories by testing them against multiple forms of data, resulting in a stronger argument” (49).

The second, Fred Gibbs and Dan Cohen’s “A Conversation with Data: Prospecting Victorian Words and Ideas,” is available as a pre-print open access version while the final version is being published by Victorian Studies. Gibbs and Cohen ask what would happen if “we could move seamlessly between traditional and computational methods as demanded by our research interests and the evidence available to us?”. As an example, they turn to the oft-cited Victorian “crisis of faith,” a phenomenon that has largely been attributed by Darwin’s theory of evolution, philosophers like Friedrich Nietzsche, and atheist literary figures like Percy Shelley. Gibbs and Cohen propose a different interpretation based upon the use of “God” and “Christian,” and find a spike in 1850 in the use of these words and then a precipitous drop-off. Their work “shows that there was a clear collapse in religious publishing that began around the time of the 1851 Religious Census, a steep drop in divine works as a portion of the entire printed record in Britain that could use further explication.” The large-scale distant analysis performed by Gibbs and Cohen reveals a historical phenomenon that has been largely looked over by academics, and yet also demands closer attention than could be delivered by distant techniques. They note that “[a]lthough detailed exegeses of single works undoubtedly produce breakthroughs in understanding, combining evidence from multiple sources and multiple methodologies has often yielded the most robust analyses.”

Both articles admit that plugging in one or two words into Google ngrams is hardly sufficient to adequately analyze cultural phenomena using computational methods, but they have strikingly different answers to the question of how to appropriately incorporate distant reading into literary study. Heuser and Le-Khac suggest that data needs to be formed into multiple tables, semantic fields, and dimensions, and that researchers need to be careful not to engage in confirmation bias. Gibbs and Cohen argue that only when large-scale analysis is combined with targeted close readings can scholars end up with the most compelling digital studies. Clearly, digitally-enhanced analysis of literary works is in its infancy, and many more studies will build upon and nuance the interesting work done here.