Post navigation

Open access win: UMass Amherst and Google reach licensing agreement

Through the diligent and intelligent efforts of the UMass Libraries staff and UMass General Council’s Office the University of Massachusetts Amherst (along with UMass Lowell) has signed a licensing agreement with Google, Inc. that allows for the use of their scanned books held within the HathiTrust. I previously wrote about the impasse created by the indemnity clause in the standard licensing agreement and made a forceful argument that the clause conflicted with the spirit of the Google Books project as well as their own arguments of legitimacy. Now that the parties have come to an amicable resolution, it seems appropriate (and with permission) to share some of the details so that the next university, library, museum, or individual scholar can better understand what resolution looks like. Here’s the agreement (adjust the width with fit width tool <[]> if the pdf is too large):

There’s much one might say about the document, but I’ll focus on Section 1 and Section 4.

In Section 1, the language carves out a space for academic, non-commercial, non-competing projects like the Pompeii Bibliography and Mapping Project. Specifically, UMass (Institution) will:

“(1) use the Institution Digital Copy only for research, scholarly, or academic purposes;” – Academic research and public outreach are the goals of the PBMP.

“(2) not share, provide, license, or sell the Institution Digital Copy to any third party…” – the PBMP plans only to link to the actual human readable documents and will use the scanned books in our Natural Language Processing efforts to better connect Pompeii’s map and bibliography.

“(3) not use the Institution Digital Copy to provide commercial search or hosting services substantially similar to those provided by Google, including but not limited to those services substantially similar to Google Book Search;” – considering Google’s great
combination of Map services, Google Book search, and Image collections, this is a bit closer to the PBMP’s mission to make the discovery of information about Pompeii as seamless and intuitive as possible. The PBMP, however, is not a commercial enterprise and in the totality of its content (c. 15,000 citations and dozens of mapping files) is surely so insignificant compared to Google that is cannot reasonably considered “substantially similar”. Right? Fortunately, the rest of the clause speaks directly to exclusion of academic projects like the PBMP.

“(4) (A) use reasonable efforts to prevent third parties from bulk downloading any portion of the Institution Digital Copy, and (B) implement technological measures (…) to restrict automated access to any part of Institution’s website where substantial portions of the Institution Digital Copy are available.” – Again, because the PBMP will not host these documents, instead mining them for OCR transcripts and producing indexes, these digital copies will not be stored in web accessible storage.

Section 4, and its further definition in Section 5, is a major change from the former agreement language (is that why its shouted in all CAPS?). In essence the change is from UMass accepting liability for any and all legal challenges arising from third party litigation, to UMass accepting liability when it (as most common means of breach, though not the only means) redistributes Google’s scanned books. This language seems to put the burden back on each party for its own actions: Google for any breach of copyright; UMass for compounding that error.

Our next steps are getting the books we need out of the HathiTrust: linked to our bibliographic database and captured for our Natural Language Processing efforts. Their datasets page offers information on extracting works not digitized by Google via their Data API and creating a formal request for Google’s digital copy.

I have a meeting next week with Laura Quilter (UMass Copyright and Information Policy Librarian) to hammer out a strategy for getting Google and Non-Google books. That process will be the subject of a subsequent post.