dedicated to DATA: digitally assisted text analysis

...the broad circumference
Hung on his shoulders like the Moon, whose Orb
Through Optic Glass the Tuscan Artist views
At Ev’ning from the top of Fesole,
Or in Valdarno, to descry new Lands,
Rivers or Mountains in her spotty Globe.
(Paradise Lost, 1. 286-91)

While reviewing the work of Hannah, Kate, and Lydia, I enjoyed the precision and concision of their annotations. A sample of them appears below. While a full documentation would require snippets of the image and the transcription as well as the annotation, the annotations themselves clearly show their minds at work, combining clear description with...

Below are the reflections of Hannah Bredar, Kate Needham, and Lydia Zoells about their adventures in the mundane world of Lower Criticism, about which I wrote in an earlier blog and of which the digital surrogates of our cultural heritage will need a lot in the decades to come. Racine observes in his preface...

This is a progress report on the basic clean-up of the 504 plays in my current Shakespeare his Contemporaries corpus (SHC). I hope to release an updated corpus by the end of November. It will replace the current corpus at https://github.com/martinmueller39/shc The SHC texts are partially curated versions of the TCP texts, which have “known...

I am delighted to post the following guest blog by Darren Freebury Jones at Cardiff University. “He was right after all, and the scholars who for a generation now have ignored or sneered at his evidence, sometimes—when they have condescended to mention it—printing the word evidence itself between inverted commas, have not turned out to...

The are somewhere in the neighbourhood of five million incompletely transcribed words in the rougly two billion words of English books before 1700 transcribed by the Text Creation Partnership. Depending on how you look at it, that is either a lot or not very much at all. Less than half a percent of words are...

In an earlier blog entry I reported about the ways in which undergraduates at Northwestern and Washington University in St. Louis have contributed to the collaborative curation of TCP transcriptions of Early Modern plays. Their work was released on github as the SHC corpus, short for Shakespeare His Contemporaries. Hannah Bredar just graduated from Northwestern...

In my earlier post “From Shakespeare His Contemporaries to the Book of English” I promised to release all SHC plays “later this spring.” I have now done so, and you may download all 504 of them from https://github.com/martinmueller39/shc. Most of the texts come from Phase I of the TCP project and have been in the...

I went to Best Buy to reduce the clutter of remote controls in my living room and simplify my life. Logitech’s Harmony may be the answer. Cheap it isn’t, but then ‘cheap’ and ‘simple’ are hardly synonyms–witness the very simple and very expensive white KPM china of the Königliche Porzellan-Manufaktur Berlin. I paused at the...

Introduction and Summary This is a report about “Shakespeare His Contemporaries” of SHC, my project for creating an interoperable digital corpus of plays that in addition to Shakespeare’s include most of the plays written within a generation before and after his active career as a playwright. Its keywords are “query potential”, “digital surrogate”, “algorithmic amenability”, and...

This is a blog post about the distribution of a special kind of “dislegomena,” tetragrams and longer n-grams whose “collection frequency” is 2 and whose “document frequency” is also 2. My purpose is to figure out how many swallows make a summer. If you are interested in the intertextual relationship between one play and another,...

Not quite two years ago I wrote an open letter about the TEI in which I wondered about its successes and failures. I wrote about “a thought experiment where you ask the chairs of history, literature, linguistics, philosophy, and religion departments of the world’s 100 top universities to write a sentence or short paragraph about the...

Hannah Bredar, Madeline Burg, Melina Yeh, and Nayoon Ahn have been at work for four weeks in their clean-up operation of the Early Modern plays in the TCP archive. Nicole Sheriko helped them in the first week and has since then focused on preparing a Young Scholar Edition of Fair Em. The clean-up operation proceeds...