dedicated to DATA: digitally assisted text analysis

...the broad circumference
Hung on his shoulders like the Moon, whose Orb
Through Optic Glass the Tuscan Artist views
At Ev’ning from the top of Fesole,
Or in Valdarno, to descry new Lands,
Rivers or Mountains in her spotty Globe.
(Paradise Lost, 1. 286-91)

I have put a new version of Shakespeare His Contemporaries on Google Drive, where you may or view or download the plays. In this version I have grouped the plays by decades and put them in directories with names like 155, 156 …165. The plays have been encoded in TEI Simple. The texts are in...

While reviewing the work of Hannah, Kate, and Lydia, I enjoyed the precision and concision of their annotations. A sample of them appears below. While a full documentation would require snippets of the image and the transcription as well as the annotation, the annotations themselves clearly show their minds at work, combining clear description with...

Below are the reflections of Hannah Bredar, Kate Needham, and Lydia Zoells about their adventures in the mundane world of Lower Criticism, about which I wrote in an earlier blog and of which the digital surrogates of our cultural heritage will need a lot in the decades to come. Racine observes in his preface...

The are somewhere in the neighbourhood of five million incompletely transcribed words in the rougly two billion words of English books before 1700 transcribed by the Text Creation Partnership. Depending on how you look at it, that is either a lot or not very much at all. Less than half a percent of words are...

In an earlier blog entry I reported about the ways in which undergraduates at Northwestern and Washington University in St. Louis have contributed to the collaborative curation of TCP transcriptions of Early Modern plays. Their work was released on github as the SHC corpus, short for Shakespeare His Contemporaries. Hannah Bredar just graduated from Northwestern...

Hannah Bredar, Madeline Burg, Melina Yeh, and Nayoon Ahn have been at work for four weeks in their clean-up operation of the Early Modern plays in the TCP archive. Nicole Sheriko helped them in the first week and has since then focused on preparing a Young Scholar Edition of Fair Em. The clean-up operation proceeds...

The following is an abridged and lightly edited version of a blog entry that I first posted in March 2010 on my now defunct Literary Informatics blog. Here is a small but potentially promising experiment with a group of undergraduates in a Shakespeare class that I taught in the winter of 2010. Its s...

In 2009 Emily Anderson and Sasha Puchalla, two undergraduates in a course on Early Modern drama I taught then collaborated on acourse assignment to to check and correct the TCP EEBO transcription of Marlowe’s Tamburlaine. They worked from a spreadsheet with a ‘verticalized’ representation of the text in which every word was a data row...