Called “In Codice Ratio,” the project aims to turn the 53 miles of shelves that make up the Vatican Secret Archives, dating back 12 centuries, into a searchable digitized database. While the archive includes priceless documents such as the papal bull that excommunicated Martin Luther, currently it is virtually useless to scholars. That needs to change. Scanning this volume of information to make it readable would be hard under ordinary circumstances, but it is particularly difficult in this case: The decorative, flowing, cursive handwriting of some of the documents proves impossible for modern optical character recognition software to comprehend.

This is where researchers from the University of Rome, La Sapienza University of Rome, and the Vatican Secret Archives enter the frame. They have developed a system involving convolutional neural networks and image-processing algorithms. This system performs a task called jigsaw segmentation, in which documents are broken down into something approximating individual pen strokes and then reconstructed as words.

They’ve also crowdsourced an army of users to help check the results: students from 24 schools, who can judge the system’s accuracy and help train it. Preliminary findings show an accuracy rate of 65 percent. This will no doubt improve as the project progresses.

“Our main goal is to complete the transcription task and to start extracting information from the manuscripts,” Paolo Merialdo, a researcher on the project, told Digital Trends. “Until now, we have been working on a sample of 1,000 digitized pages of the Vatican Registers. Once we have a reliable transcription system, the next step is to ask the Vatican Secret Archives [for] the remaining manuscripts of the Pope Registers. They are digitizing their manuscripts for preservation and to make them available for researchers, under their copyright protection. At the same time, we are currently working in many research directions, including more sophisticated neural networks for the handwritten text recognition step, and advanced natural language processing tools for handling abbreviations.”

Should the project prove successful, its creators believe this system could not just unlock the mysteries of the Vatican Secret Archives, but also be an invaluable solution to help researchers explore enormous historical archives that are similarly inaccessible.