The University of California’s secret agreement with Google for book digitization promises to improve access to parts of its library collections, but the contractual restrictions UC has accepted may enrich Google’s shareholders at public expense.

Digitizing the world’s books, films, video, sound recordings, maps, and other cultural artifacts could, to quote Internet Archive founder Brewster Kahle, provide “universal access to all human knowledge, within our lifetime.” So it’s troubling to see public institutions transfer cultural assets, accumulated with public funds, into private hands without disclosing the terms of the transaction.

Basic principles to govern mass digitization and safeguard the public interest have been developed by members of the American Library Association (forthcoming; see also http://litablog.org/?p=200), and by the Open Content Alliance. UC even signed on to the OCA principles (disclosure: I’ve worked for the OCA), which are designed to provide a baseline for digitization projects, in its scanning agreements with Yahoo and Microsoft. Transparency is a primary value to both the OCA, and the ALA.

So problem one is that the terms of the UC / Google agreement are secret, and were arrived at with no public input. As an institution that receives state and federal funding, UC should expect and welcome public comment if its inventory is effectively being privatized. The president’s office says it expects that terms will only come out after it receives the equivalent of a FOIA request. Since when does it take a FOIA request to get information from the library?

But it isn’t just the public that is excluded–it’s the rest of the library community. Mass digitization is very complex (see Paul Courant’s brilliant new article in First Monday). Librarians must grapple with new and unfamiliar issues that can only be resolved through dialog with peers. Google appears to be doing all it can to prevent this from happening, imposing NDAs on libraries at the start of discussions about mass digitization. By isolating librarians from each other, Google dramatically strengthens its negotiating position, and UC negates the goal of academic openness.

The second problem is more complex. Mass digitization is expensive. Public institutions that wish to digitize their holdings usually need to partner with private firms to get the work done. As described in Marketing Culture in the Digital Age, funded by the Mellon Foundation, and written by my colleague Peter Kaufman of Intelligent Television, commercial investment in digitization can be good for all concerned.

But private companies, at least profitable ones like Google, don’t work for free. So the public institutions need to pay for those services. Typically, they can’t pay in cash, so they pay in other ways, with labor, facilities, and some type of rights agreement. In other words, public use of and access to the digitized cultural works is usually limited in some way to benefit the private firm. This has to be done in the open.

The recent Smithsonian/Showtime agreement is a case in point that clearly shows what can go wrong in such a process. To recap, Showtime convinced the Smithsonian to sign a secret 170 page, 30 year agreement which gives Showtime control of the Smithsonian’s film and video archive. This particular saga has been widely covered elsewhere, but the roots of catastrophe are in 1) secret negotiations 2) exclusivity 3) length of term.

UC’s agreement is probably not explicitly exclusive. But as a practical matter, scanning doesn’t happen twice; libraries learned this when their material was microfilmed (as an aside, the microfilming was sometimes done badly, and to this day microfilm users suffer from those quality problems). This deal will be costly for UC in staff time and other resources, and the chances that another vendor will come through and duplicate the work are slim.

In the absence of the text of the agreement, it’s difficult to know what specific clauses may affect the ability of California citizens to read online the books now in their libraries. But there is a plausible nightmare scenario that UC needs to act now to prevent.

From the University of Michigan agreement (obtained only as a result of public records laws in Michigan, and despite Google’s best efforts) it is clear there will be restrictions on what UC can do with the digital scans. This is a critical issue. If this deal follows the pattern at Michigan, there will be limits determined by Google on how UC may share its digital holdings with other libraries.

If the scanning process is made efficient at all the universities now in Google’s orbit, a book already scanned at Harvard won’t be rescanned at Berkeley. So Berkeley may not receive a copy, and because of the restrictions on sharing its holdings, won’t have an easy time getting one from Harvard. The student of 2012 will have a choice: go to the complete digital library, owned by Google, or go to the partial digital library of his or her own university.

That extreme scenario may not come to pass, but there are many other questions about the Google / UC deal:

* What more might UC be able to do if its scanning project were funded by the legislature or foundations, rather than by Google?
* UC says the “digitized books will be searchable through Google Book Search.” Can anyone else build services that access this data? Or is it another case of “Google can crawl everyone else’s data, no one can crawl Google’s data?”
* What quality assurances will Google provide? How can we ensure this won’t be a repeat of the microfilm experience?
* Will UC have copies of the full, high quality scans, or will certain information, such as image positioning data needed for searching, be kept by Google alone?
* What restrictions will be placed on UC’s use of those scans?
* What will be the different treatments for material in copyright, or orphaned, or in the public domain?
* Is it reasonable to ask the public to pay a second time (or watch ads) for material already purchased, simply because it’s now necessary to convert the format in which it is stored?
* Why haven’t the Regents appointed a panel of advisors on this matter?

Clearly, UC’s high level goals are laudable. The Google people I’ve met believe in the company motto, “don’t be evil.” And it is not really in the public interest to side with the publishers who are the loudest voices now attacking Google, and a primary cause of the all the secrecy. Yet by acquiescing to Google’s demands for secrecy, UC has compromised the public interest, and set a dangerous precedent for the rest of the academic community.

It is clear that the world would be a better place if the wealth of information contained in libraries is aggregated and shared with the rest of the world digitally, and I believe this will happen with or without Google’s participation.

There are lots of good things in this country and I believe libraries is one of them, I am pretty sure there are policies and views that transcend matters of time and technology and I hope to see those policies and views not step over by ambitions and good intentions that are not properly modeled or implemented.

From what I read I should agree that issues of secrecy and privatization in this matter aren’t properly handled. On the other hand there are times where it is important to overlook or avoid outdated or incomplete policies and views, just to get a good project on the road, some smart but very legal decisions need to be taken in order to accomplish this.

I definitely want to see this done in my life time and idealistically would like to believe that Google and UC have noble intentions as well. That been said they have to work a lot harder on getting a model that keeps our information available to everybody and making sure that private interests become completely secondary. Google’s participation in this matter should be short lived and with short term benefits at most. But in the long run the project and the model should be totally out of their hands.

Some of the questions have obvious answers like “digitize once share everywhere” should be one of the business specs of this model. Many more can be answered as quickly giving celar indication on completeness and readiness of the model to be implemented.

I’d say it’s not exclusive. I mean, it’s the second agreement of this type that UC has entered into. See our post on the deal UC cut with Microsoft in June. That’s beyond agreeing to be part of OCA. That’s actually doing the scanning with Microsoft.

UC is the only university or organization I know of right now working with two rival scanning projects. So the scanning is indeed likely to happen twice, at least for books in the UC collection.

The points you raise are all well taken. I also see no reason why the details shouldn’t eventually come out under California’s own freedom of information laws. UC’s a public institution.

[…] From Television Archiving: “The University of California’s secret agreement with Google for book digitization promises to improve access to parts of its library collections, but the contractual restrictions UC has accepted may enrich Google’s shareholders at public expense” […]

The canard that the UM agreement was only released because of a FOIA request (“obtained only as a result of public records laws in Michigan”) is widely disseminated but, well, not true. The wheels were put in motion for releasing the agreement months before any request was received, and we posted the agreement because we believe that information in it addresses questions about which there should be no speculation.

[…] In the Television Archiving blog, digital media consultant Jeff Ubois writes, “… as a practical matter, scanning doesn’t happen twice. … This deal will be costly for UC in staff time and other resources, and the chances that another vendor will come through and duplicate the work are slim.” […]