Upload Slides

Feedback

Please help us improve your experience by sending us a comment, question or concern

Message:

Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.

Description

Transcribing documents from the printing press era, a challenge in its own right, is more complicated when documents interleave multiple languages---a common feature of 16th century texts. Additionally, many of these documents precede consistent orthographic conventions, making the task even harder. We extend the state-of-the-art historical OCR model of Berg-Kirkpatrick et al. (2013) to handle word-level code-switching between multiple languages. Further, we enable our system to handle spelling variability, including now-obsolete shorthand systems used by printers. Our results show average relative character error reductions of 14\% across a variety of historical texts.