Knowledge Is Contagious

Menu

Google Tries Converting Every Book into Digital Format

MOUNTAIN VIEW, Calif. — Ben Zimmer, executive producer of a Web site and software package called the Visual Thesaurus, was seeking the earliest use of the phrase “you’re not the boss of me.” Using a newspaper database, he had found a reference from 1953.

But while using Google’s book search recently, he found the phrase in a short story contained in “The Church,” a periodical published in 1883 and scanned from the Bodleian Library at Oxford.

Ever since Google began scanning printed books four years ago, scholars and others with specialized interests have been able to tap a trove of information that had been locked away on the dusty shelves of libraries and in antiquarian bookstores.

According to Dan Clancy, the engineering director for Google book search, every month users view at least 10 pages of more than half of the one million out-of-copyright books that Google has scanned into its servers.

Google’s book search “allows you to look for things that would be very difficult to search for otherwise,” said Mr. Zimmer, whose site is visualthesaurus.com.

A settlement in October with authors and publishers who had brought two copyright lawsuits against Google will make it possible for users to read a far greater collection of books, including many still under copyright protection.

The agreement, pending approval by a judge this year, also paved the way for both sides to make profits from digital versions of books. Just what kind of commercial opportunity the settlement represents is unknown, but few expect it to generate significant profits for any individual author. Even Google does not necessarily expect the book program to contribute significantly to its bottom line.

“We did not think necessarily we could make money,” said Sergey Brin, a Google founder and its president of technology, in a brief interview at the company’s headquarters. “We just feel this is part of our core mission. There is fantastic information in books. Often when I do a search, what is in a book is miles ahead of what I find on a Web site.”

Revenue will be generated through advertising sales on pages where previews of scanned books appear, through subscriptions by libraries and others to a database of all the scanned books in Google’s collection, and through sales to consumers of digital access to copyrighted books. Google will take 37 percent of this revenue, leaving 63 percent for publishers and authors.

The settlement may give new life to copyrighted out-of-print books in a digital form and allow writers to make money from titles that had been out of commercial circulation for years. Of the seven million books Google has scanned so far, about five million are in this category.

Even if Google had gone to trial and won the suits, said Alexander Macgillivray, associate general counsel for products and intellectual property at the company, it would have won the right to show only previews of these books’ contents. “What people want to do is read the book,” Mr. Macgillivray said.

Users are already taking advantage of out-of-print books that have been scanned and are available for free download. Mr. Clancy was monitoring search queries recently when one for “concrete fountain molds” caught his attention. The search turned up a digital version of an obscure 1910 book, and the user had spent four hours perusing 350 pages of it.

For scholars and others researching topics not satisfied by a Wikipedia entry, the settlement will provide access to millions of books at the click of a mouse. “More students in small towns around America are going to have a lot more stuff at their fingertips,” said Michael A. Keller, the university librarian at Stanford. “That is really important.”

When the agreement was announced in October, all sides hailed it as a landmark settlement that permitted Google to proceed with its scanning project while protecting the rights and financial interests of authors and publishers. Both sides agreed to disagree on whether the book scanning itself violated authors’ and publishers’ copyrights.

In the months since, all parties to the lawsuits — as well as those, like librarians, who will be affected by it — have had the opportunity to examine the 303-page settlement document and try to digest its likely effects.

Some librarians privately expressed fears that Google might charge high prices for subscriptions to the book database as it grows. Although nonprofit groups like the Open Content Alliance are building their own digital collections, no other significant private-sector competitors are in the business. In May, Microsoft ended its book scanning project, effectively leaving Google as a monopoly corporate player.