The Bigger Picture: Visual Archives and the Smithsonian

Google Halts an Archiving Project

On May 20th, a flurry of reports took note of Google’s decisions to halt its ambitious efforts to digitize the contents of newspaper archives and make them online and at no cost. Walking away from a program it began in 2006 to make two hundred years worth of articles public and searchable—and after scanning millions of pages from over two thousand currently operating and defunct newspapers—Google signaled it was switching gears to focus on helping publishers monetize their deep wells of content. The company issued a statement saying while users could still search previously digitized content, “we don’t plan to introduce any further features or functionality to the Google News Archives and we are no longer accepting new microfilm or digital files for processing." Instead, Google will harness its corporate and technological heft to support OnePass, a payment platform program instituted back in February. According to the Boston Phoenix, an alternate newspaper and News Archive project partner, and as reported by Search Engine Land, Google said it would be concentrating on “newer projects that help the industry” and “enables publishers to sell content and subscriptions directly from their own sites.”

Example of how newspapers are scanned, here on a MediaScan 880C duplex Newspaper Scanner, Courtesy of Newspaper Scanning Systems YouTube channel. Google maintains that it wasn’t abandoning it earlier intended goal because of copyright issues, although those certainly generated controversy and threw a monkey wrench into, for example, Google’s earlier and equally ambitious plan to scan the contents of books in library collections worldwide. It has also been suggested by the UK’s The Guardian that this new turn of events may have been prompted by Apple’s rival payment plans with a number of newspapers. In an article posted online by The Atlantic quotes Carly Carlioi, an editor at the Boston alternative weekly The Boston Phoenix, who explained:

“News Archive was generally a good deal for newspapers—especially smaller ones like ours, who couldn't afford the tens or hundreds of thousands of dollars it would have cost to digitally scan and index our archives—and a decent bet for Google. It threaded a loophole for newspapers, who, in putting pre-internet archives online, generally would have had to sort out tricky rights issues with freelancers—but were thought to have escaped those obligations due to the method with which Google posted the archives. (Instead of posting the articles as pure text, Google posted searchable image files of the actual newspaper pages.) Google reportedly used its Maps technology to decipher the scrawl of ancient newsprint and microfilm; but newspapers are infamously more difficult to index than books, thanks to layout complexities such as columns and jumps, which require humans or intense algorithmic juju to decode. Here's two wild guesses: the process may have turned out to be harder than Google anticipated. Or it may have turned out that the resulting pages drew far fewer eyeballs than anyone expected.”

Or, the decision might be seen as part on the ongoing issue that publishers and museums and educational institutions, too, are now facing: how to balance a mandate or desire to make content available against the costs of actually doing that.

I'm sad to hear Google giving up on this effort. Google usually does things that other companies see as economically indefeasible, and they succeed at it. Maybe once archive.org scans all the books in existence, they'll move on to newspapers.

It truly is difficult to balance the desire to make content available against the costs of actually doing it. It is a task that must be done by some body. But who? I would suggest that we ask the IMF to fund the project or take money away from other projects like underground CO2 storage or residential area wind farms.

Even as digital storage options become more plentiful and less expensive, the costs incurred in collecting, processing, scanning and tagging materials may still create financial hurdles. Does that mean that decisions about what get's saved is destined to be what pays for itself, makes a profit, or strikes a benefactor's specific fancy?