University of Michigan Libraries Target HathiTrustís Orphan Works
by
Barbara Quint
Posted On June 6, 2011

Orphan works have been a thorn in the side of librarians, particularly academic librarians, forever. Orphans are in-copyright works for which current copyright holders are difficult or impossible to locate or perhaps even identify. With the multi-front move to digital books now underway, the problem of getting permission to use orphans has gone from an occasional one-campus/one copy task to opening up millions of books to broad web access. A journey of a thousand miles begins with just one step. And, the Copyright Office at the University of Michigan libraries (MLibrary) has just tied the laces on its hiking boots by initiating a project to identify the orphans within the 8.7 million digital books in the HathiTrust Digital Library. An estimated 73% (6,351,000) items in the collection are in-copyright and, from that number, an initial 100,000-item sample identified 45% as orphans. So, if my calculations are correct, MLibrary staff could be looking for more than 2.85 million Little Orphan Annies.

HathiTrust is funding the project. It has a close, almost incestuous, relationship with the University of Michigan. One of the original library partners of Google Books (née Google Print), the University of Michigan was also the most generous by allowing complete access to the entire library collection— both in- and out-of-copyright. John Wilkin, associate university librarian for library information technology and technical and access services, led the Michigan team in creating the Google partnership. Wilkin is also one of the founders and currently executive director of HathiTrust. The core of HathiTrust’s Digital Library comes from merged copies of digital books returned to early library partners from Google Books. Wilkin authored a study published by the Council on Library and Information Resources (CLIR) and based on HathiTrust collection statistics that estimated the extent of the orphan problem (“Bibliographic Indeterminacy and the Scale of Problems and Opportunities of ‘Rights’ in Digital Collection Building”).

Information on and any extended permissions for orphan works stemming from the new initiative should enrich the HathiTrust’s collection beyond what Google Books offers for the same items. At present, HathiTrust opens full-text access to public domain books and authorized items in its collection. The effort could also provide overall statistics for use in analyzing and designing the creation of a legal or policy-based framework that would allow scholars and researchers to access these works.

Melissa Levine, U-M Library’s lead copyright officer, says that the project will initially focus on 1923-1963 U.S. works, specifically those determined to be in-copyright by the U-M’s Copyright Review Management System (CRMS). The 100,000 item sample of which 45% were orphans stemmed from work by the CRMS under a grant from the Institute of Museum and Library Services.

The first phase of the orphan works identification project will develop procedures that can eventually be used by other HathiTrust partner institutions to expedite a task that will ultimately require checking millions of volumes. “We’re also going to create a mechanism to publicize bibliographic information about the orphans, to give their ‘parents’ the opportunity to claim them,” says Levine. Though forms are in place or being created for copyright holders to instruct HathiTrust in how to handle their material—preferrably through Creative Commons licenses, according to Levine, it is more probable that the majority of orphans have no surviving person or entity to claim ownership.

The project is just getting started. According to Levine, “The first cut establishes what material is in print. So we assume there are active copyright holders and no orphans. In the second cut, we look for a publisher or listed copyright holder and their location though they may not be the current rightsholder and may or may not be orphans. There are many variables, but we will take a broad swath first. On the third cut, we look at those where we can’t find a publisher and those are definitely orphans. We will also put out a public call for people to tell us about their books.”

The process can be very daunting with endless tortuous details. For example, if a book is in-copyright, but out-of-print, publisher contracts may revert the copyright to the author. But, as Levine points out, “Sometimes reversion is automatic, sometimes it requires authors to make an assertion. We will also provide help to authors and encourage them to exercise their rights.”

HathiTrust hopes that the techniques and procedures developed by this project may serve as a model for other libraries. The first libraries Levine expects to apply the lessons learned will be other HathiTrust institutions. HathiTrust’s 52 members include the Committee on Institutional Cooperation (CIC) and university libraries at Indiana University, Columbia, Princeton, Yale, Harvard, Duke, Johns Hopkins, Purdue, Stanford, and the University of California.

The Copyright Office is part of MPublishing, the primary academic publishing enterprise of the University of Michigan. It offers copyright information and assistance to the U-M community, and takes an active role in global discussions on about copyright and libraries.

This project is not the only front upon which the battle over custody of orphans is being fought. The rejection by Judge Denny Chin of the Google Books Amended Settlement Agreement for the lawsuit brought by publisher and author associations still has a litigious future. The European Union recently issued a proposed directive for handling orphans in its member countries. This proposal would include a registry database for orphans. As the Europeana mass digitization project moves to compete with Google Books, resolving such issues becomes more critical.

Nor is this the only related project at the University of Michigan libraries. According to Levine, the CRMS has “another project on identifying the copyright status underway for four Berne Convention countries—the United Kingdom, Canada, Australia, and Spain. If we can identify authors who have passed away, we can find their public domain status now in those countries as well as on the authors’ releases.”

Barbara Quint was senior editor of Online Searcher, co-editor of The Information Advisor’s Guide to Internet Research, and a columnist for Information Today.