Early Modern OCR Project (eMOP) Receives Mellon Grant

English Professor Laura Mandell, Director of the Initiative for Digital Humanities, Media, and Culture (IDHMC), along with two co-PIs Professor Ricardo Gutierrez-Osuna and Professor Richard Furuta, are very pleased to announce that Texas A&M has received a 2-year, $734,000 development grant from the Andrew W. Mellon Foundation for the Early Modern OCR Project (eMOP, http://emop.tamu.edu ). The two other project leaders, Anton DuPlessis and Todd Samuelson, are book historians from Cushing Rare Books Library.

Over the next two years, eMOP will work to improve scholarly access to an extensive early modern text corpus. The overarching goal of eMOP is to develop new methods and tools to improve the digitization, transcription, and preservation of early modern texts.

The peculiarities of early printing technology make it difficult for Optical Character Recognition (OCR) software to discern discrete characters and, thus, to render readable digital output. By creating a database of early modern fonts, training the software that mechanically types page images (OCR) to read those typefaces, and creating crowd-sourced correction tools, eMOP promises to improve the quality of digital surrogates for early modern texts. Receiving this grant makes possible improving the machine-translation of digital page images with cutting-edge crowd-sourcing and OCR technologies, both guided by book history. Our goal is to further the digital preservation processes currently taking place in institutions, libraries, and museums globally.

The IDHMC, along with our participating institutions and individuals, will aggregate and re-tool many of the recent innovations in OCR in order to provide a stable community and expanded canon for future scholarly pursuits. Thanks to the efforts of the Advanced Research Consortium (ARC) and its digital hubs, NINES, 18thConnect, ModNets, REKn and MESA, eMOP has received permissions to work with over 300,000 documents from Early English Books Online (EBBO) and Eighteenth-Century Collections Online (ECCO), totaling 45 million page images of documents published before 1800.

The IDHMC is committed to the improvement and growth of digital projects and resources, and the Mellon Foundation’s grant to Texas A&M for the support of eMOP will enable us to fulfill our promise to the scholarly community to educate, preserve, and develop the future of humanities scholarship.

For further information, including webcasts describing the problem and the grant application as submitted, please see the eMOP website: http://emop.tamu.edu

[…] grant that the Early Modern OCR Project (see the entry for Jacob Heil) received was announced in a post last fall. More recently, EMOB devoted a post to the image-matching software developed at the […]