The National Science Foundation has announced a program in Advancing Digitization of Biological Collections (ADBC), which provides an opportunity to realize the strategic plan articulated by the Network Integrated Biocollections Alliance. According to the program announcement: “This program seeks to create a national resource of digital data documenting existing biological collections and to advance scientific knowledge by improving access to digitized information (including images) residing in vouchered scientific collections across the United States. The information associated with various collections of organisms, such as geographic distribution, environmental habitat data, phenology, information about associated organisms, collector field notes, tissues and molecular data extracted from the specimens, etc. is a rich resource for providing the baseline from which to further biodiversity research and provide critical information about existing gaps in our knowledge of life on earth. The national resource will be structured at three levels: a national hub, thematic networks based on collaborative groups of collections, and the physical collections. This resource will build upon a sizable existing national investment in curation of the physical objects in scientific collections and contribute vitally to scientific research and technology interests in the United States. It will be an invaluable tool in understanding the biodiversity and societal consequences of climate change, species invasions, natural disasters, the spread of disease vectors and agricultural pests, and other biological issues.” The deadline for full proposals is December 10, 2010.

In February of this year, we kicked off a strategic planning process for a national digital biological collections infrastructure in the U.S. with a workshop at NESCent. An outline emerged from that workshop and was posted on this blog, with announcements broadcast widely to the biological collections community.

We were pleased to receive an outpouring of useful feedback from the community through the mailing list and through this blog, both from individuals and from representatives of biodiversity organizations, institutions and other stakeholder groups. All of it was though-provoking and helpful in refining the strategic plan.

A second, larger, workshop was held in early May (also at NESCent) that updated the strategic planning document in light of all the community input. We are happy to announce the availability of the updated strategic planning document that emerged from this process [1], and to report that we have conveyed this document to the National Science Foundation.

It has been very gratifying for those of us shepherding this process to watch the community coalesce around the initiative, and generously provide contribute to the effort of imagining the future of biological collections infrastructure. We look forward to watching this unfold in the coming months and years, and will strive to keep this site updated as things develop.

Developing a digital U.S. biological collections national resource: First steps towards a strategic plan

Summary

A strategic plan for a 10-year national effort to digitize and mobilize images and data associated with biological research collections is being developed. The key objective of the plan is to create a publicly available, comprehensive national collections resource that can be used to address a wide range of research questions and serve stakeholders in government agencies, academic institutions, and international biodiversity organizations. A workshop, held at the National Evolutionary Synthesis Center on February 5-7, 2010, drafted the present outline for the digitization and web mobilization of data and images associated with U.S. biological collections. Input from the community is requested as this plan develops to ensure that it builds appropriately on existing projects and reflects the missions and needs of the nation’s diverse biological collections.

Significance of Collections Digitization

Biological collections, gathered over more than two centuries of research and exploration, represent a significant national resource for research and applied biology that has been underutilized in the digital realm. Knowledge of the history of life is accessible only through biological collections of specimens, fossils, tissues, images and other data that are held in perpetuity by museums, universities and various state, federal, and non-governmental agencies. Knowledge of biodiversity, obtained through the use of collections, is critically important for studies of invasive species, biological conservation programs, land management strategies, biotic responses to climate change, the spread of pathogenic organisms, and research and management activities of many kinds. A coordinated effort to digitize existing biological collections and to mobilize the data and images in a freely available online resource is needed. Recent technological advances in the digitization of collections, combined with decades of experience and emerging efforts to standardize and integrate across collections, put the collections community in a position to address the problem in a concerted way. This effort would have major, positive impacts on U.S. scientific achievement and global scientific collaboration.

The Scope of Collections Digitization

Collections digitization is defined broadly to include transcription into electronic format of various types of data associated with specimens, the capture of digital images of specimens, and the georeferencing of specimen collection localities. In order to assess the scope of the undertaking required to digitize the nation’s collections, the collections community has conducted a survey to provide an overview of the number and diversity of specimens contained in U.S. collections. Additionally, the community has held three workshops on “Future Directions in Biodiversity and Systematics Research”. These, in addition to two recent reports (1,2), highlight the scale of the challenge, the need to address the integration of digitized biological data, the need to coordinate the capture of specimen data and images, and the necessity of providing broad accessibility to specimen data by scientists worldwide. Estimates of collection size range as high as three billion specimens globally, with as many as one billion or more specimens preserved and cared for by U.S. institutions, most of which (as high as 90%) are not accessible online.

At the current time there does not exist a comprehensive strategic plan for the digitization of the nation’s biological research collections. To be effective, such a plan should be conceived as a grand challenge and undertaken as a unified mission involving a coordinated funding program and well designed strategy for execution. In addition to addressing needs for physical care and housing of collections and support of collections-based research broadly (3,4), it is vital that the U.S. increase the online accessibility of its biological collections through an integrative and focused digitization effort in order to be able to best utilize the full value of our national biological collections resources. The plan also calls for the development of cyberinfrastructure to promote efficient and standard capture and mobilization of these data to make the national biological collections resource publicly available for analysis. The present focus of this strategy is on the digitization and mobilization of existing collection data. This initiative would not directly support the development of new collections or collection improvement through enhanced infrastructure, curation or management.

Objectives, Vision and an Outline for Organizing of Effort

The key objective of the plan is to create a publicly available, sustainable and comprehensive national collections resource by digitizing and mobilizing data from the nation’s biological research collections. Some of the desirable features of this new digital collections resource are:

• Images and data from all U.S. biological collections, large and small, integrated in a web accessible interface using shared standards and formats.

• New web interfaces, visualization and analysis tools, data mining, image analysis, and georeferencing processes developed and made available for using and improving the collections resource.

• The existing massive backlog of non-digitized collections digitized and web mobilized, while tools, training, and infrastructure created for preventing the reoccurrence of such a backlog.

A suggested framework for the digitization effort is presented here, for the purposes of obtaining community feedback on models for developing a biological collections digitization initiative.

Three tiers of effort that will accomplish this objective have been identified:

1. Develop a coordinated effort to provide technological support for the nationwide collections digitization effort, to organize new efforts with existing collections-based projects and international efforts, and to disseminate standards, techniques and best practices. This effort might take the form of a new center based at a single institution, a collaborative administrative group across institutions, or some other model that will achieve the same function.

2. Develop a network of regional collaborations for collection digitization across the U.S. These regional efforts might consist of institutions housing both large and small collections from the same region that unite to focus on digitization and web mobilization of collections in order to contribute to the national collections resource.

3. Develop investigator-driven and cross-regional collaborations driven by the specific needs of collections of a particular clade or preservation type, or motivated by a particular scientific question to be addressed by the use of collections images and data.

Strategy for Community Involvement

The plan to create a national digitized biological collections resource requires a strategic plan with broad support and input from the collections community and a diversity of stakeholders. Such a strategic plan incorporating community suggestions will be the product of this effort. The mechanism for community participation in this planning includes wide distribution of the present outline to institutions, agencies, and professional societies. The responses to the plan, collected through email and blog commentary will be used in future meetings to complete a strategic plan. Community feedback on the initiative outlined here is critical. Feedback can be made by adding a comment on the blog page (https://digbiocol.wordpress.com/), sending an email to wg-digitization@nescent.org, or contacting individual participants in the recent meeting (www.nescent.org/wg_digitization/Main_Page). Group feedback based on institutional priorities or taxon-based needs is welcomed. Specific feedback is needed in areas such as support for the proposed model, suggestions for revision, ideas regarding the three-tiered approach suggested here, priorities for collection digitization, and ways to maximize collaboration across institutions and federal agencies, and at the international level. This feedback will be aggregated and provided to participants in future planning sessions that will develop a final strategic plan.

3. Stevenson, J. W. and D. W. Stevenson. 2003. Development of a national systematics infrastructure: a virtual instrument for the 21st century. Report to the National Science Foundation, Biodiversity Surveys and Inventories Program. New York, December, 2003.