Paris, 1885: A man is arrested for theft. At the police station, a clerk records his name, photographs his face and profile, measures his body in fourteen places, and notes any scars or tattoos.

Has he been arrested before? It's difficult to say.

By this time, the population of Paris is nearly 2 million, and police can't recognize every criminal personally. National ID cards don't yet exist. Fingerprinting won't be accepted for another twenty years.

The Bertillon method soon spreads beyond Paris to the United States, where completed cards still exist in archives and historical societies.So the prisoner's body is broken apart — decomposed — into data and recorded on a specialized Bertillon card. Filed according to a complex system, Alphonse Bertillon's method effectively creates a pre-digital database of recidivism.

"Building this dataset is like planting a seed in the ground," says Langmead. "The dataset will flower into many different subject areas and tell us more about the human experience."

What computers don't see

The collected cards, which contain a wealth of sociological information about prisons, including a prisoner's race, nationality, crime committed, eye color, and even ear shape, account for the vast majority of Ohio's state prisoners from 1897 to 1911.

Langmead and her students visited the Ohio History Connection in Columbus and photographed both sides of 12,500 cards.

But transforming the resulting 25,000 images from raw historical artifacts into a usable dataset is a challenging task: The cards are old, warped, and the ink is blotchy.

And then there's the nineteenth-century handwriting.

"All these steps that we usually assume are easy, once you add a little bit of a noise, recognition becomes a very difficult problem," says Paul Rodriguez, research analyst at the San Diego Supercomputer Center (SDSC).

Rodriguez, who is using the cards to enhance cursive-offline handwriting recognition capabilities on the Comet supercomputer, explains that many machine-learning image-recognition research works with data that has been pre-cleaned.

While the Bertillon cards contain text of predictable types in a constrained form, humans are virtually unconstrained in how they fill them out.

"In many ways, the negative results are more interesting than the positive results," says Rodriguez. "This project let us explore what these so-called ‘deep' computer machine-learning methods are capable of and how far they can go for us."

"The importance of identifying as many variations in data and as early as possible in the lifespan of a project can't be emphasized enough, especially when the processing algorithms are dependent upon the grouping of data based on such variations," says Puthanveetil Satheesan.

Who needs supercomputing?

Langmead, Rodriguez, and Puthanveetil Satheesan were brought together by Alan Craig, digital humanities specialist for the XSEDE project.

Academics in all fields can make the mistake of believing that humanists don't need technology to complete their research. But that's not so, says Langmead."My role is to engage new and different kinds of projects, particularly from the arts and humanities," says Craig. "I find the technical people to work with them and make sure the computing resources are available."

"There are lots of reasons why humanists would need supercomputing infrastructure. In my case, I had really interesting data and colleagues who wanted to try out these systems on it. It's a match made in the twenty-first-century academy."

"We're all scientists, we're all looking for knowledge," she says. "The computer is a tool — it can be used for whatever we need.

All humanists are already digital humanists in some small measure. Some of us also take it as our charge to become much more highly engaged with the ways that computing can be used to facilitate our research."

Wild card. Bertillon cards contained a record of 14 measurements, notes of body peculiarities, and standardized photographs for easy recognition by law enforcement. Courtesy Head of Service Régional d'Identité Judiciaire de Paris.

Throw the book at 'em. Frontispiece from Bertillon's Identification anthropométrique (1893), demonstrating the measurements one takes for his anthropometric identification system.

Ear today, gone tomorrow. Bertillon saw ears as a means to identify individuals. His system collared 241 repeat offenders in 1884 and caught the notice of police across Europe. Courtesy The Wellcome Trust. (CC Attribution 4.0.)