Abstract

Detecting US Federal Documents to Expand Access

This paper reports on HathiTrust’s Federal Documents Program, which facilitates collective action to create a comprehensive digital collection of United States government publications issued by the Government Printing Office and other agencies. Most government information is now produced and distributed digitally, but US research libraries, especially those that participate in the Federal Depository Library Program, hold large numbers of historical print publications that are difficult to discover, find, and use. In June 2016 HathiTrust held over 700,000 items identified as federal documents, but we know this to be only a fraction of what exists. Because of varied cataloging practices we have limited understanding of the number of federal documents at the title level, as well as the corresponding number of volumes, the number of pages, and their distribution across libraries in North America. All of these are important details necessary to plan comprehensive mass digitization of federal documents. A major component of HathiTrust’s program has been the development of the US Federal Documents Registry, envisioned as a reliable inventory of items published at the expense of the US government. The methodology employed for the Registry’s development includes extensive comparative bibliographic analysis, based upon more than 20 million records submitted by 40 libraries in response to a request from HathiTrust. This paper describes methods of de-duplication, relationship-detection, and record consolidation. While many potential use cases exist for such a registry, its primary role is as a tool for identification of materials to be digitized among HathiTrust member libraries and in partnership with other agencies and groups.