11 Million Volumes

HathiTrust reached a new milestone, surpassing 11 million volumes in the digital repository. A history of HathiTrust’s road to the first 10 million volumes is available on the HathiTrust blog.

Updated HathiTrust Volume Identifiers

HathiTrust has made a one-time, batch change to a set of approximately 320,000 volume identifiers. These volumes were ingested with an incorrect identifier due to a vendor issue. The change involves adding a $ symbol to affected identifiers. A full list of the updated identifiers is available at http://www.hathitrust.org/hathifiles. Any institutions or individuals that save links to HathiTrust volumes locally should update these identifiers to ensure working links. Please contact feedback@issues.hathitrust.org with any issues or questions.

Ingest

Locally-Digitized

HathiTrust ingested new content from the Universidad Complutense de Madrid, received content from the University of Delaware, and communicated with Emory University, University of Chicago, and University of Washington about submission of locally-digitized content.

Internet Archive-digitized

HathiTrust ingested new content from the University of Massachusetts, Amherst, and continued conversations about ingest with the University of Alberta.

Program Steering Committee

Projects

Copyright Review

A summary of the determinations from HathiTrust copyright review activities in February is given below. See CRMS-US and CRMS-World, projects funded by IMLS, for further information.

February

Overall

Public Domain Determinations

All Determinations

Public Domain Determinations

All Determinations

CRMS-US

2,561

2,727

161,510

309,548

CRMS-World

2,670

5,320

49,832

96,402

Total

5,231

8,047

211,342

405,950

Government Documents Registry

Project staff continued to draft functional requirements for the registry, and are in the process of obtaining initial feedback on the requirements from selected members from HathiTrust partner and non-partner institutions. Staff also continued to develop methods for identifying duplicate and related records, and explore ways the US government documents community could contribute to the development of the registry.

HathiTrust Research Center

The HTRC invited eight finalist candidates in an RFP for WCSA, a Mellon Foundation-funded project to support the prototyping of workset creation tools, to Chicago to present their proposals. Four of the candidates will be awarded grants of $40,000 over 9 months to develop their prototypes.

mPach

University of Michigan staff began to migrate the Prepper module of mPach to a new Ruby/Rails development environment (a full list of mPach modules is available at http://www.lib.umich.edu/mpach). Staff added an mPach article to the HathiTrust test repository, and began to evaluate additional tools for converting articles into JATS XML that might be incorporated into the Norm component of Prepper.

Development Updates

HathiTrust institutions performed the following work related to applications and infrastructure:

Full-text Search

Staff continued to test and refine the index synchronization and release process on new high-performance storage for full-text search. After stability problems were encountered during attempts to roll out the new storage in production, staff began working with the storage and network equipment suppliers to troubleshoot and optimize performance. (See Availability, below.)

Staff finished developing and testing a new version of SLIP (Solr Large-scale Indexing Processor), which is used to index the full-text of works in HathiTrust. Production deployment will occur in March. Staff added features to support the indexing of JATS XML content, and indexing of volumes into a configurable number of “chunks”. Staff have been exploring chunking volumes at indexing time in order to improve the relevance ranking of search results. Staff also added indexing support for words that are hyphenated across line breaks on pages of text. This is effective immediately for searches conducted within volumes and will take effect for volumes in cross-repository searches as volumes are indexed going forward. Approximately 4.5 million HathiTrust volumes will be re-indexed in mid-March during a regular monthly update of HathiTrust partner print holdings information; a complete re-indexing process is planned for late April. Staff additionally integrated a spelling suggester feature into a Solr request handler in development and began testing the suggester with several data sets.

Pageturner

Staff at California Digital Library developed an “Embed this Book” feature that is now available in the “Share” section of the PageTurner sidebar. Users can copy the HTML for embedding either 1up or 2up views into websites and blogs.

Storage Replacement Cycle

Staff completed installation of new and replacement storage for the 2014 cycle. Retired storage will undergo security wiping in March and be returned to fulfill trade-in credit obligations.

Availability

Repository

Cumulative 12-month availability of repository access: 99.827%*

HathiTrust was unavailable for some or all users on Monday, February 3 from 12:05-12:10pm and Tuesday, February 4 from 1:45-1:55am and 6:45-7:00am due to stability problems encountered during attempted production rollouts of new high-performance storage for full-text search.

HathiTrust was unavailable for some or all users on Thursday, February 20 from 2:53-3:07pm due to a temporary network issue at the Michigan instance that occurred while the Indiana instance was out of service for routine maintenance.

* Repository access refers to page viewing and full-text search functionality, i.e., user-facing applications. It does not refer to preservation or storage infrastructure, which is under continual operation.

Zephir

A maintenance outage occurred on the Zephir FTPS server on March 6, 2014 from 6:00-6:30am PST. During the brief maintenance outage, contributors were not able to submit bibliographic records. Zephir systems other than the FTPS server were not affected, and maintenance was conducted successfully.

New Growth

As of February 1:

February

Overall

Boston College

110

2,796

Columbia University

1

65,037

Cornell University

3,120

444,331

Duke University

1,394

7,258

Harvard University

0

237,435

Indiana University

0

195,580

Keio University

8,829

88,954

Library of Congress

18,205

107,929

New York Public Library

2

288,372

North Carolina State University

0

3,196

Northwestern University

21

37,601

Ohio State University

19,439

19,445

Penn State University

1,906

71,329

Princeton University

0

251,710

Purdue University

0

44,698

Texas A&M University

0

1,201

Universidad Complutense

133

112,147

University of California

7,725

3,461,923

The University of Chicago

85

39,077

University of Florida

2

9,765

University of Illinois

10,988

126,603

University of Massachusetts, Amherst

8,731

8,731

University of Michigan

1,043

4,668,481

University of Minnesota

1,148

119,768

University of North Carolina, Chapel Hill

0

17,025

University of Virginia

0

50,821

University of Wisconsin

21

555,947

Utah State

0

117

Yale University

0

23,678

Total

82,903

11,060,955

Public Domain (~33%)

Total*

59,381

3,675,204

* Includes volumes opened through copyright review and rights holder permissions