Tuesday, September 24, 2013

Hopes and fears for Google Books case

We're back in the saddle of the now epic lawsuit against Google for its massive scanning of the books held by libraries. I have very mixed feelings about the case and its outcomes, and the news reports from yesterday's hearing (transcript) in Judge Denny Chin's court are not making me feel any better about it.
In brief, the Author's Guild is claiming that Google violated fair use by scanning in-copyright books. Since that act alone is not sufficient to address a defense of fair use, they also state (correctly, in my view) that although Google is not providing advertising on the individual book pages, that it overall makes money off of the scanned books because that digital corpus enforces its position against other search engines.
There are some things that the Authors Guild has right (such as, that Google makes money off of search results pages that can include links to Google Books), but they miss the mark in other arguments:

"For all intents and purposes, it paid libraries for the right to digitize and
copy much of our nation’s literary heritage and then used the resulting digital library to gain a
competitive advantage over search engine competitors that respected the rights of authors by
limiting their digitization programs to books that were either licensed or were no longer
protected by copyright. Aided by its infringing conduct, Google’s search engine has proven
remarkably successful—to the point where “google” has become a widely used verb in the
English language.

First, the addition of Google Books to the search took place long after we were all "googling." Google's main value still comes from providing access to open web resources that otherwise would just be a massive digital junk heap. I suspect that those who are interested in using Google to search within the text of "closed" books (ones that are not available as full text online) consciously go to the Google Book Search pages. I don't know this for a fact, but I'd be willing to bet that user intent behind most Google searches is to access the actual content of a web page or document, not to be given a reference to an off-line resource.

Next is the statement that Google "paid libraries for the right to digitize..." This makes it sound like Google gave the libraries money, and that there was no cost to the libraries. The agreement between Google and libraries was an exchange that had costs for both (less for the libraries, more for Google) and benefits for both (less for the libraries, more for Google). In the end, Google got the better part of the deal, but libraries got something, even though something they have not yet been able to greatly benefit from: libraries got copies of the scans at a lower price than had they done the digitization themselves. Unfortunately, due to both copyright issues and the nature of the agreement between Google and the libraries, there are significant barriers to making the kind of uses that would make this a truly transformative corpus for research.

All of the news reports emphasized some comments by Judge Chin to the effect that Google Books appears to be both transformative (in the copyright law sense) and a benefit to society. What worries me a bit is that Judge Chin is not looking beyond the use of the resulting digital texts for search. I consider search to be the tip of the iceberg, and the visible part of Google Books that Google would like everyone to focus on. My assumption is that Google has a research interest in having exclusive access to 20 million non-Web digital texts in a myriad of languages, and that this research is aimed not only at search but at Google's desire to be THE interface between man and machine, which means that machines have to get better at human languages.

If Judge Chin rules that Google's book digitization is fair use, it's a huge win, not only for Google but also for libraries. After all, if it is fair use for Google to digitize works for the purposes of searching, there is no question that it is also fair use for libraries to do the same. If Judge Chin rules that Google's book digitization is NOT fair use because of profit-making, then we still do not know for sure whether library digitization would be considered fair use (although much would depend on exactly how the decision is worded). This of course makes me want to cheer on Chin toward the "is fair use" decision, but at the same time I know that this means that any research that Google is doing on its private cache of digital texts will continue, giving them great advantages over competitors in the arms race of technology advancement.

Once again, I so wish that large-scale digitization for search and research had been undertaken by libraries, not Google. The questions of "not for profit" and social value would be a slam-dunk, and I'd not be harboring this fear that there is a hidden agenda behind the project. Maybe if libraries had done this we'd only have two or three million digitized books, not 20 million (as is claimed for Google), but they'd be untainted, in my mind, and I could still consider them a cultural heritage resource rather than a commercial product.