dedicated to DATA: digitally assisted text analysis

...the broad circumference
Hung on his shoulders like the Moon, whose Orb
Through Optic Glass the Tuscan Artist views
At Ev’ning from the top of Fesole,
Or in Valdarno, to descry new Lands,
Rivers or Mountains in her spotty Globe.
(Paradise Lost, 1. 286-91)

Thou com’st in such a questionable shape: Data Janitoring the SHC corpus from the perspectives of Hannah, Kate, and Lydia

Below are the reflections of Hannah Bredar, Kate Needham, and Lydia Zoells about their adventures in the mundane world of Lower Criticism, about which I wrote in an earlier blog and of which the digital surrogates of our cultural heritage will need a lot in the decades to come. Racine observes in his preface to Bérénice that toute l’invention consiste à faire quelque chose de rien (all invention consists of making something from nothing). These three “inventors”, after spending much time time with commas and stray printers’ marks, came up with excellent insights into the business of criticism and the (un)certainties of making sense of texts, especially old ones.

Kate and Shakespeare’s scepticism

Is this a comma that I see before me,
Its tail hanging down? …Or art thou but
A comma of the mind, false punctuation
Proceeding from the text-oppressed brain?

Correcting transcriptions can sometimes feel like banging one’s head against a massive, impermeable wall. As often as I made a definitive correction, it seemed, I came across something that appeared irresolvable. Is this a period or a badly printed comma? A misaligned end-stop or the remnant of an intended colon? How long can I stare at it before I realize I will never know? We (undergraduates like me, new to the instability of early modern texts) arrive with a conception of textual clarity and authenticity that in many cases is simply not there. Some cases might be answered by looking at more books, more witnesses, visiting more libraries over more hours, but this isn’t conducive to curating an entire literary corpus for digital publication. And even were we to fully collate every text in the database, some of these questions might never be resolved. This means making peace with the unsatisfactory text and setting our aims somewhere less idealistic: closer to “good enough.” We turn towards clarity, functionality, and truthfulness to the text without forcing on it a definitiveness it does not have in every instance.

In a previous post on this blog (“How to fix 60,000 errors” June 22 2013), Prof. Mueller noted that the original 60,000 known errors in the SHC transcriptions constituted just 0.4% of the data in the database. That number is statistically insignificant for computer analysis of the texts, but even a cursory look at the transcriptions themselves confirms that the presence of so many errors is prohibitive to human readers, whatever the statistical significance. Making corrections at this level was our aim this summer, to help propel the transcriptions from masses of computer data to texts readable (and enjoyable) for people. For those who hail (with trepidation) the digital humanities as the end of reading and human response, our work is a reminder that digital texts and projects are ultimately designed with human readers in mind. Our sense of “good enough” is governed not by statistical significance but by the demands of human persnickety-ness, of the desire to immerse oneself in a text that at least appears to be “complete.”

Out damn’d ink blot! Out I say

Why should Macbeth be the play that lends itself most easily to (admittedly quite silly) comparisons with this work? When the instability of the source texts themselves obstructs our own desire for authority, how do we respond? What degree of alteration would be considered “murthering” the text, and how do we square our conscience with these, arguably inescapable, choices about what to transcribe, what to make more legible, and what to leave as crux? This might feel oddly dramatic as written here, but the experience of sitting face to face with a 16th century book, of making choices about how that text is transmitted and transcribed, feels something akin to tragedy for the conscientious and affectionate reader. And while this must be old-hat for those who work with these texts every day, it was entirely new to me. The cruxes I’ve described did not represent a majority of that errors we examined, but they are the ones that stick out in my memory, that solicited a sense of deep frustration strangely at odds with the silent stillness of the reading room. Yet more powerful than this frustration was the feeling of awe at these texts that had, somehow, survived—survived fire and flood and most of all indifference to sit before me, open and ready to survive once more.

Hannah’s Folger Reflections

Washington waxed feverish outside the walls of the Folger Shakespeare Library, but a different atmosphere persisted within. The rooms were chill, verging on icy; the wool-clad scholars were, wittingly or otherwise, alert. I sat with Lydia Zoells and Kate Needham in what Folger regulars call the New Room (ca. 1980), attracted to its abundant natural light. We were there to perform a task: in the course of two weeks, we intended to correct the maximum number of the remaining 20,000 errors in the database of early modern plays transcribed by Annolex. This was Phase Two of the Shakespeare His Contemporaries project, in collaboration with the greater Text Creation Partnership initiative. Previously, Professor Mueller had enlisted a handful of undergraduates, including myself, to check Annolex’s translation of texts with coinciding EEBO images. Unfortunately, due to the fact that these images were microfilm photographs of other pictures, the quality was often too poor to ascertain whether a mark on a page was an exclamation point or an erroneous blot of ink.

Lydia, Kate, and I convened at the Folger in order to determine if the original manuscripts housed there could illuminate any of the troubling instances that the digital tools could not. Previously, we had employed this brass-tacks method of cross referencing on an individual basis, adding the Bodleian Library, the University of Chicago Library, the Northwestern University Special Collections, and the Newberry Library to our list of visited sanctums. The Folger, however, seemed to hold the key to our transcription puzzle. We placed orders for over 50 texts, all of which the Library had in its vaults. Our work did not cease: we did not halt our editing when a tourist set off a fire alarm, and we only glanced up when specialists hung Henry Fuseli’s life-sized painting of the Macbeth witches on the facing wall.

As the days progressed we saw that most of the errors that we were correcting were ambiguous punctuation marks. In former phases of the project it was far easier to discern the meaning of a single word from its context clues than it was to determine whether a faint mark was a semicolon or a comma from its context alone, so the punctuation remained uncorrected. Even at the Folger it was often too difficult to identify such a mark with total certainty. Thus, we faced a recurring dilemma: do we leave the error uncorrected and the play incomplete, or correct the error to the best of our thinking and risk changing the text? This conflict inspired a number of conversations about the ethics of guessing at such a correction and the chance of accidentally transforming a text from its original form. Occasionally, Folger staff and scholars would join our conversations. They cited the movement in the 18th century to “improve” manuscripts such as these, when scholar-editors would add apostrophes and commas with prodigal liberalism in the hopes of clarifying an author’s “intended” rhythms and cadence. Inferring authorial intent of sixteenth century punctuation, when standard punctuation did not exist, was not only impossible but also a time sink, which we could not afford. One wise Folger staff member suggested that at some point an editing effort could be “good enough” and a text set aside.

These conversations were tea time discussions. Each afternoon and with charming inconsistency, a bell would ring: scholars would file out of the reading rooms, descend the stairs to the cafeteria, and revive as they nibbled biscuits and sipped steaming mugs. After witnessing a few days of the animation and conversation that arose during these mid-afternoon gatherings, I realized that tea time was crucial to intellectual life at the Folger. It was here that readers shared with one another their findings, their theories, and their academic mirth. Based on a mutual interest in English breakfast tea and early modern books, a community of scholars took shape. The Shakespeare His Contemporaries project strives to broaden this community. With free access to a cleaned-up database of early modern texts, a greater public can in turn discuss “moral editing,” the risk of drawing a text away from its original form, and the concept of work that is “good enough.” By adding more voices to these conversations, the worlds of both early modern literature and digital humanities will have the opportunity to complicate, broaden, and flourish.

Lydia on the Materiality of the Text

As my undergraduate career has progressed, I have become increasingly aware of, and fascinated by, the material nature of books. This has been facilitated by the fact that my studies tend toward literature that was written before 1700. For a long time, like most people, I took textual stability for granted and never thought about where, or rather what, books came from. But slowly, I became acquainted with EEBO, started reading textual introductions, and began to seek out classes that considered the materiality of texts. In my junior year, I took part in Professor Joseph Loewenstein’s Spenser Lab, where I took part in his project to produce an edition of the collected works of Edmund Spenser, diving headfirst into that rich area of interaction between the digital humanities and book history. When Professor Loewenstein suggested that I become involved with Professor Mueller’s project, Shakespeare His Contemporaries (SHC), I agreed because I was excited by the opportunity to work with early modern books in person as well as to contribute to early modern scholarship in a meaningful way.

Between April and July 2015, sometimes with Kate and Hannah, sometimes alone, I corrected transcriptions using the first edition playtexts at the University of Chicago Special Collections Resource Center, the Newberry Library, the Folger Shakespeare Library, and the Houghton Library at Harvard University. Becoming comfortable working in these libraries and handling the delicate books was certainly one of the most valuable parts of my experience. The librarians were very accommodating and patient when it came to instructing a novice in the delicacies of handling the texts, and soon I was at ease with the books and with my surroundings. Each library I visited has its own atmosphere, and each one was a pleasure to get to know (though they were all kept at arctic temperatures). The books themselves offered their own special pleasures. I enjoyed finding the classified advertisements pinned inside front covers, engravings of stiff-looking authors, and the odd annotations left by early readers.

While the work of tracking down and entering punctuation marks, letters, and words was in large part tedious, it would sometimes bring me in contact with interesting passages. One of the great pleasures of working with colleagues who have a similar enthusiasm for early modern theater is that we often shared these moments with one another. This kind of work does not lend itself to a depth of understanding in the body of literature with which we were working, but I do believe that splashing in the pool that is the SHC corpus is valuable at this point in our undergraduate careers. We gained a kind of broad familiarity with the early modern dramatic corpus, and often found plays that interested us that we did know existed before.

It is important to me that our project this summer will contribute to the dissemination of quality transcriptions of early modern plays, especially of little known works. It was exciting when a correction I made felt meaningful: when it made a significant semantic difference in the text, or when it brought up an interesting question. It is my hope that these transcriptions will continue to be questioned and checked, but also that they will make the plays easier to read and more transparent for scholars and students. I have often been frustrated by the difficulty of finding good copies of less canonical plays, and making good transcriptions publicly available is a good start.

1 comment

We should congratulate Hannah, Kate, and Lydia on the energy and care that they have devoted to their task. Martin Mueller’s SHC is going to be an invaluable tool for scholars in many different areas, and I can’t wait until a version is available for non-techy people like myself.

You ask for reader’s comments: well, I noticed an amusing persistence of a long ‘s’ in the quotation of a line from [Kyd] Arden of Faversham:” Why should he thrust his sickle in our corne”, which has been read as “fickle”. One error less! Many thanks to you all.