Populist presidential candidate William Jennings Bryan electrified the 1896 Democratic National Convention with a speech in which he called for a new currency standard based on silver rather than gold.

Over the next few years, his “Cross of Gold” ideas spread across the country, with thousands upon thousands of newspaper mentions.

But it took 120 years and a collaboration between Georgia Tech data scientists and University of Georgia historians to see what the spread of that idea had actually looked like. Starting in Chicago, site of the convention, “Cross of Gold” moved to the populous East Coast, then jumped to the West Coast before filling in the less populated areas.

“Going viral” may have taken longer in the 19th century, but the principle was much the same.

Researchers tracked Cross of Gold’s spread using a free service called US News Map, a database of more than 10 million newspaper pages that is helping researchers see history with spatial information that hadn’t been available before. Using digitized newspaper articles and cutting-edge search technology, the project is helping researchers see the nation’s history in new ways.

“Every historical development has a spatial component to it, and often one that is central to explaining the ‘how’ and the ‘why,'” says Claudio Saunt, chair of the history department at the University of Georgia. “With this new search engine, we now have the ability to see where newspapers were writing about a subject, and how interest in that subject changed over time. It’s a powerful tool for historians, and one that can shed new light on the past.”

US News Map is based on data from approximately 10 million pages published in nearly 2,000 US newspapers between 1836 and 1924. The newspapers represent what was happening in nearly 800 US cities. More pages are being added all the time, though some states still have not contributed digital newspaper data and are therefore not represented on the project’s map.

To create the database behind the search engine, text from the newspaper pages was scanned by universities around the country, and each word indexed, explains Trevor Goodyear, a research scientist in the Georgia Tech Research Institute (GTRI). The application uses Apache Solr database software, a document database that allowed GTRI researchers to efficiently store and index the large volumes of text and associated metadata.

The processed text exists across eight different servers, some in a data center at Georgia Tech and some in a cloud server provided by Amazon Web Services. When a user types an inquiry into the website, the servers all participate in the search together. The text database is linked to images of the newspaper pages housed at the Library of Congress, so when users find an item of interest, they can see its context on the original newspaper page.

Watch terms pop up in cities

The innovations, says Goodyear, were to show when each instance of a term appeared in the newspapers and to animate those appearances. Dots on the map show all mentions of the term in all newspapers across each US city, lighter dots indicating multiple mentions. Users of the site can move a slider to see how terms pop up in different cities over time.

“We’ve placed the data onto a map of the United States that allows users to view how the term moved across the country over time,” he says. “You can navigate through time to see how each term was used in different locations. You really get a sense for how ideas went viral during that time in history.”

The Library of Congress awards grants to universities across the United States for digitizing historic newspapers. This digitization process involves applying optical character recognition (OCR) techniques to convert the printed words into computerized text.

Through imperfections in the newspapers’ preservation and errors in the scanning and translation process, the results can look very different from what was originally published in the newspapers. Information lost in translation includes the distinctions between headlines, article content, author bylines, and newspaper titles.

Due to these limitations, the system links users to the full newspaper page on which the search term appears instead of to individual scanned articles.

Other newspaper databases exist and the Library of Congress newspaper collection is searchable, but no other source shows the spatial component of history in this way, says Saunt, who is an American history professor. He expects US News Map will be useful to more than historians.

“With US News Map, it is easy to trace the evolution of a term—to see where it originated and how it spread—something that linguists are deeply interested in,” he says. “Historians will be able to see how news stories moved across the continent, and rose and fell over time.”