Google Book Search Adds Big, Brave Partner: The University of California by
Barbara Quint
Posted On August 14, 2006PAGE: 1 2

It's been almost 2 years since Google dramatically expanded its book digitization program—originally known as Google Print but now called Google Book Search—from only having publisher partners to including five major research libraries. Now, another major research institution has joined the Google Book Search library program (http://books.google.com). The 100 libraries on the 10 campuses of the University of California (UC) —the largest research and academic library in the world—opened a composite 34 million book collection to Google. In making the announcement, UC clearly indicated its intention to go all the way and include in-copyright materials in its contribution, even though the Google Book Search project already faces two major lawsuits from publishers and authors challenging its legality.

Mission, more than money, is driving the move, although revenue issues were thoroughly discussed. In making the announcement, Robert C. Dynes, president of the University of California, made the following statement: "The digitization project furthers UC's mission. It greatly expands our ability to give scholars and the public access to the kinds of information and ideas that drive scholarly innovation and public knowledge and discourse." UC operates from a "shared governance," as described by Daniel Greenstein, associate vice provost for scholarly information, university librarian, and head of UC's California Digital Library (http://www.cdlib.org). The nonadministrative side consists of the faculty. John Oakley, chair of UC's Academic Senate and professor of law at UC Davis, also espoused the mission strategy. "The academic enterprise is fundamentally about discovery. We contribute to it immeasurably by unlocking the wealth of information maintained within our libraries and exposing it to the latest that search technologies have to offer," said Oakley.

Greenstein described the move to open the UC library collections to Google as "a sound business decision." Library partners in Google Book Search receive digital copies of any book digitized from their collections. These can become the ultimate back-up copies for archiving and preservation. At present, according to UC announcements, tens of thousands of their books have become brittle; printed on acid rich paper, they have begun to crumble into dust.

Paper is not the only brittle substance to alarm UC librarians. California itself has a certain fragility, breeding a nightmare scenario commonly referred to as "waiting for the Big One." While the seismic threat was not mentioned specifically in UC public statements about the new program, the frequent references to Hurricane Katrina as an example of the "potential impact that natural disaster can have" seemed to indicate one of the leverage points used to induce approval of the project.

Plans as to what UC intends to do with its digital copies are still in the works. However, public domain material will have free and unfettered full-text access throughout the system, including links to the online Melvyl Catalog. Books still in copyright will only be accessible in keeping with copyright law. UC promises not to use them to replace copyrighted books. Instead, it hopes that the increased discoverability of copyrighted works will increase use of items in its purchased collections. However, a "dark archive" (i.e., a digital preservation repository) will ensure permanent archiving of both copyrighted and public domain digitized material as a safeguard.

Now What?

In regard to working through the problems involved in deciding to make the move, Greenstein said, "It was easier coming late to the game, not being in the first group. There was more information. It was also really helpful from the library point of view to have the deal with the Open Content Alliance. We knew how hard it is to scan at scale, the logistic challenges, [and] the technical challenges. Just moving the books around is hard, identifying bibliographic records. There are so many challenges at so many different levels."

At this point, development of a "scan plan" is still under way. However, estimates indicate that millions of books may be digitized under the program by the end of the decade. Costs for UC to support the program run around $1 a book for the first year and $0.10 a book per year thereafter. These costs cover the staff and other expenses of moving material from shelves to Google and back again to the shelves. If the UC chose to build its own access services for public domain material, that would involve additional investment.

Though details are still not set, Greenstein expected the northern of the UC's two, regional, high density, storage facilities would be the first site chosen. He was sure that it would link to public domain documents from catalog records. He also expected the UC would end up saving money by better managing a collection systemwide that has a lot of overlap among its 100 individual libraries. "Our policy," said Greenstein, "is to seek efficiencies in management so we can broaden and deepen our collections. We can leverage our dollars to get more value out of the dollars we spend by seeking efficiencies in the physical handling of public domain materials."

UC libraries already have several major digitization efforts underway, in particular with the Open Content Alliance (http://www.opencontentalliance.org), which is backed by Microsoft, Yahoo!, and the Internet Archive. (See the NewsBreak "Open Content Alliance Expands Rapidly; Reveals Operational Details" at http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=16091 and the links to related NewsBreaks it contains.) These efforts will continue. Charles Faulhaber, director of the Bancroft Library at UC Berkeley, rallied his colleagues: "This will mean a major ramping-up in our digitization projects, both at Bancroft and elsewhere in the UC system." Greenstein hopes that relations with the Open Content Alliance will continue to be good. "At the end of the day, both entities are pursuing complementary initiatives. It's a fruitful alignment. Google is making a huge investment. Our partnership helps us get where we want to go too and we feel the same about the Open Content Alliance."

From Google's Angle

Google also was excited about adding the new partner and, one assumes, by the vote of confidence it represented for the ongoing and ultimate success of the controversial project. Susan Wojcicki, vice president of product management at Google, stated: "We're thrilled to begin working with the University of California libraries to include their incredible collection in Google Book Search." Progress continues at the five initial research libraries participating in the project—the University of Michigan, Harvard University, Stanford University, Oxford University, and the New York Public Library. Google also has a pilot project with the Library of Congress' World Digital Library, a project to digitize rare, important material held in U.S. and Western libraries but focusing on non-Western countries and cultures.

If you're interested in the Google Book Search project, you might want to read its history (http://books.google.com/googlebooks/newsviews/history.html). Oddly enough, the project appears to be older than Google itself. Two Stanford graduate students, working on the Stanford Digital Library Technologies Project back in 1996, built a specialized crawler for book content called BackRub. BackRub's citation analysis technique later evolved into the PageRank algorithms underlying Google. Of course, the names of the two students were Sergey Brin and Larry Page. After a brief detour to solve the problem of indexing the World Wide Web, they finally got back to the world of books in 2002.

Google currently allows users to read the complete text of public domain material online, while providing bibliographic information and keyword-in-context "snippets" for copyrighted material. Google also links to online bookstores and publisher Web sites and expects to link to OCLC Open WorldCat records of library holdings.

Not all public domain material in Google Book Search is readable online, however. If a publisher partner in the Google Book Search project submits a reprint volume for out-of-copyright material (though some of the contributory material—introductions, prefaces, appendices, etc.—might be original and in-copyright), then Google Book Search treats that material as in-copyright, linking to bookstores and supplying ordering information. Adam M. Smith, product manager for Google Book Search, confirmed that situation. However, he indicated that if Google could find another version of a public domain title for an item it already had from a publisher (in particular from a library partner), it would digitize that in order to bring users the full text. Different versions or editions of public domain titles may co-exist in Google Book Search.

If you and your library would be interested in joining the Google Book Library Project, you can start your investigation by checking out the Book Search Help Center—"How can I get my library involved?" (http://books.google.com/support/bin/answer.py?answer=43741&topic=9082). Smith said that Google already has discussions underway with several other research libraries, including international ones. Discussions have come from e-mailed contacts (all of which are read), calls from librarians, conversations at library conferences, etc. At this point, Smith indicated that Google was particularly looking to tap special collections of unique material and to open up new areas of collection. UC's Greenstein recommends the experience of working with the Google Book Search staff. "They are very sophisticated and very smart people. It's a pleasure to work with them and with the Open Content Alliance people too. Working with smart people is one of the nice things about this business."