Tuesday, October 16, 2012

Copyright Victories, Part II

I did a short factual piece for InfoToday on the Authors Guild v. HathiTrust decision that was issued last week. The Authors Guild brought the suit against HathiTrust because HathiTrust is storing copies of books, digitized by Google, that are still under copyright. Fortunately for HathiTrust, its partners, and all of us in libraries, the judge decided:

The digitization of books for the purposes of providing a
searchable index is transformative, and therefore is a Fair Use under
copyright law.

The provision of these search capabilities
“promotes the Progress of Science and useful Arts” and thus supports the
goals of U.S. copyright policy and law.

The provision of
in-copyright texts for visually impaired students and researchers is in
direct support of the Americans With Disabilities Act.

The decision in the case of the Authors Guild v. Hathitrust echoes some of the same thinking as the GSU case, in particular on the educational and research use of intellectual property. This case hinged on the use of the digitized texts for indexing rather than for reading. The judge determined that the books in HathiTrust were not substitutes for the books on the library shelves, since they are not presented to users as texts to be read. The "transformation" of the readable texts to a searchable index that returns only page numbers and the number of times a term appears on the page results in a new product, not an imitation of the hard copy.

The judge decided this for HathiTrust, but this is the same question that is being asked in the Authors Guild lawsuit against Google. There are some obvious differences between the two situations, however. First, unlike HathiTrust Google is a for-profit company, so it loses points on the first factor of the fair use test:

(1) the purpose and character of the use, including whether such use is
of a commercial nature or is for nonprofit educational purposes;

Because Google is digitizing works primarily from university libraries, both HathiTrust and Google do well on the second factor:

(2) the nature of the copyrighted work;

Works of a creative nature (defined as "prose fiction, poetry, and drama") are given greater protection than works of fact. HathiTrust reports that only 9% of its digital collection meets the "creative" definition.

The third factor:

(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole

would seem to go against HathiTrust (and Google), but the judge looked at the two primary uses for the digital texts, keyword indexing and providing digital copies to members of the community with sight disabilities, and determined that they could not be done with anything less than a complete copy. If the argument of transformation is made in the case against Google, this factor should be the same.

Factor four is about the effect on the market:

(4) the effect of the use upon the potential market for or value of the copyrighted work

This is a bit tricky because presumably HathiTrust will point its users to the library-owned hard copies of the book, especially since many of the digitized books will be out of print and not available from publishers. Therefore there isn't much interaction with the market at all. The judge added that, if anything, the greater amount of discovery might lead to sales, but I wouldn't hold out much hope for that. The other use is to provide access to the blind; this is a non-market for print materials if there ever was one. Google, on the other hand, has partnered with publishers to sell digitized books as ebooks, and therefore the positive market force should be stronger in that case if Google can show that previously out-of-print books can be sold through its service.

Not mentioned anywhere that I can find is the question of digital "photographs" of pages vs. OCR'd text. The suit and the decision blend these together as "a digital copy." Having seen some of the results of Google's digitization I can say that the text resulting from the OCR can be quite lossy depending on the page layout (tables of contents in particular come out quite badly) and the quality of the original book. It is also the "transformation" part of the copying, since the photographs of the pages are simply copies of the page and are by their nature human-readable substitutes for the page itself. The judge seems to consider these "transitory" but in fact they are quite solidly real, and are stored in the HathiTrust repository. I suspect it is these pictures of the pages that the Authors Guild fears will be pirated should HathiTrust be hacked, less so the OCR'd pages which are unattractive plain text. However, HathiTrust was able to show the judge that it takes security quite seriously, and the Authors Guild was unable to demonstrate any quantifiable risk.

What is heartening in this decision is the judge's enthusiasm for the role of libraries in further science and knowledge, and his great admiration of HathiTrust's service to scholars and to the blind. His decision is both factual and moral: he refers to the "invaluable contribution to the progress of science and the cultivation of the arts that at the same time effectuates the ideas espoused by the ADA." We could not have hoped for a better advocate for digital libraries than Judge Harold Baer Jr.