The purpose of the toolkit is to enable research library directors to raise awareness of the issues of data management with administrators and researchers on campus.

Data are valuable assets that in some cases have an unlimited potential for reuse. The awareness toolkit underscores the need to ensure that research data are managed throughout the data lifecycle so that they are understandable and usable.

"This is a very timely document" says Marnie Swanson (University of Victoria), Chair of the CARL Data Management Sub-Committee. "More than ever, data are a critical component of the research endeavor and this toolkit will help libraries raise awareness in the scholarly community of the importance of data stewardship."

Research Data: Unseen Opportunities provides readers with a general understanding of the current state of research data in Canada and internationally. It is organized into seven sections: The Big Picture; Major Benefits of Data Management; Current Context; Case Studies; Gaps in Data Stewardship in Canada; Data Management Policies in Canada; Responses to Faculty/Administrative Concerns; What Can Be Done on Campus?

This report . . . describes the results of the surveys conducted by PARSE.Insight to gain insight into research in Europe. Major surveys were held within three stake-holder domains: research, publishing and data management. In total, almost 2,000 people responded; they provided us with interesting insights in the current state of affairs in digital preservation of digital research data (including publications), the outlook of data preservation, data sharing, roles & responsibilities of stakeholders in research and funding of research.

Media Vault Program partners offer a number of specialized tools to help campus researchers manage their materials. These include:

WebGenDL (UCB Library Systems) — the library's internal system for managing, creating, preserving and discovering digital library content. These tools are aimed primarily at mature, publishable sets of materials, rather than the broader context of research data

UC3 Curation Micro-services — a set of low barrier tools for full lifecycle enrichment of objects (e.g., identity, fixity, replication, annotation). The first few will be rolled out publicly in January 2010. These are presented not as a user interface, but rather as behind-the-scenes services

Sakai 3 — the next-generation version of the platform that powers the Berkeley campus's bSpace application. Due in 2011, Sakai 3 will include a range of social tools to help users extend and disseminate their materials

To augment these services, and to handle use cases beyond their scope, the MVP team examined a number of potential platforms. . . .

Of these candidates, Alfresco stands out as the most functional, out-of-the-box solution. With a little customization, it can be readied for user testing. Therefore, the MVP team has selected it as the basis of its next round of discussions with stakeholders, partners and prospective users.

File formats are the principal means of encoding information content in any computing environment. Preserving intellectual content requires a firm grasp of the file formats used to create, store and disseminate it, and ensuring that they remain fit for purpose. There are several significant pronouncements on preservation file formats in the literature. These have generally emanated from either preservation institutions or research projects and usually take one of three approaches:

recommendations for submitting material to digital repositories

recommendations or policies for long term preservation or

proposals, plans for and technical documentation of existing registries to store attributes of formats.

More recently, attention has broadened to pay specific attention to the significant properties of the intellectual objects that are the subject of preservation. This Technology Watch Report has been written to provide an overview of these developments in context by comparative review and analysis to assist repository managers and the preservation community more widely. It aims to provide a guide and critique to the current literature, and place it in the context of a wider professional knowledge and research base.

Data from high-energy physics (HEP) experiments are collected with significant financial and human effort and are mostly unique. At the same time, HEP has no coherent strategy for data preservation and re-use. An inter-experimental Study Group on HEP data preservation and long-term analysis was convened at the end of 2008 and held two workshops, at DESY (January 2009) and SLAC (May 2009). This document is an intermediate report to the International Committee for Future Accelerators (ICFA) of the reflections of this Study Group.

The Institute of Museum and Library Services has awarded $249,623 to the University of North Carolina Chapel Hill School of Information and Library Science for the Closing the Digital Curation Gap project.

Scientists, researchers, and scholars across the world generate vast amounts of digital data, but the scientific record and the documentary heritage created in digital form are at risk—from technology obsolescence, from the fragility of digital media, and from the lack of baseline practices for managing and preserving digital data. The University of North Carolina Chapel Hill (UNC-CH) School of Information and Library Science, working with the Institute of Museum and Library Services (IMLS) and partners in the United Kingdom (U.K.), are collaborating on the Closing the Digital Curation Gap (CDCG) project to establish baseline practices for the storage, maintenance, and preservation of digital data to help ensure their enhancement and continuing long-term use. Because digital curation, or the management and preservation of digital data over the full life cycle, is of strategic importance to the library and archives fields, IMLS is funding the project through a cooperative agreement with UNC-CH. U.K. partners include the Joint Information Systems Committee (JISC), which supports innovation in digital technologies in U.K. colleges and universities, and its funded entities, the Strategic Content Alliance (SCA) and the Digital Curation Centre (DCC).

Well-curated data can be made accessible for a variety of audiences. For example, the data gathered by the Sloan Digital Sky Survey (www.sdss.org) at the Apache Point Observatory in New Mexico is available to professional astronomers worldwide as well as to schoolchildren, teachers, and citizen scientists through its Galaxy Zoo project. Galaxy Zoo, now in its second version, invites citizen scientists to assist in classifying over a million galaxies (www.galaxyzoo.org). With good preservation techniques, this data will be available into the future to provide documentation of the sky as it currently appears.

Data and information science researchers have already developed many viable applications, models, strategies, and standards for the long term care of digital objects. This project will help bridge a significant gap between the progress of digital curation research and development and the professional practices of archivists, librarians, and museum curators. Project partners will develop guidelines for digital curation practices, especially for staff in small to medium-sized cultural heritage institutions where digital assets are most at risk. Larger institutions will also benefit. To develop baseline practices, a working group will establish and support a network of digital curation practitioners, researchers, and educators through face-to-face meetings, web-based communication, and other communication tools. Project staff will also use surveys, interviews, and case studies to develop a plan for ongoing development of digital curation frameworks, guidance, and best practices. The team will also promote roles that various organizations can play and identify future opportunities for collaboration.

As part of this project, the Digital Curation Manual, which is maintained by the DCC, will be updated and expanded www.dcc.ac.uk/resource/curation-manual/chapters and the Digital Curation Exchange web portal will receive support (http://digitalcurationexchange.org). Through these efforts, the CDCG project will lay the foundation that will inform future training, education, and practice. The project's research, publications, practical tool integration, and outreach and training efforts will be of value to organizations charged with maintaining digital assets over the long term.

The Web is ephemeral. Many resources have representations that change over time, and many of those representations are lost forever. A lucky few manage to reappear as archived resources that carry their own URIs. For example, some content management systems maintain version pages that reflect a frozen prior state of their changing resources. Archives recurrently crawl the web to obtain the actual representation of resources, and subsequently make those available via special-purpose archived resources. In both cases, the archival copies have URIs that are protocol-wise disconnected from the URI of the resource of which they represent a prior state. Indeed, the lack of temporal capabilities in the most common Web protocol, HTTP, prevents getting to an archived resource on the basis of the URI of its original. This turns accessing archived resources into a significant discovery challenge for both human and software agents, which typically involves following a multitude of links from the original to the archival resource, or of searching archives for the original URI. This paper proposes the protocol-based Memento solution to address this problem, and describes a proof-of-concept experiment that includes major servers of archival content, including Wikipedia and the Internet Archive. The Memento solution is based on existing HTTP capabilities applied in a novel way to add the temporal dimension. The result is a framework in which archived resources can seamlessly be reached via the URI of their original: protocol-based time travel for the Web.

In order to dig deeper into possible reasons behind archivists’ and librarians’ reluctance to archive Web sites, the study described here asks professionals to reveal their Web archiving experiences as well as the information sources they consult regarding archiving Web sites. Specifically, the following two research questions are addressed: Are librarians and archivists at institutions of higher education currently engaged in or considering archiving Web sites? What sources do these professionals consult for information about Web archiving?

Preserv 2 investigated the preservation of data in digital institutional repositories, focussing in particular on managing storage, data and file formats. Preserv 2 developed the first repository storage controller, which will be a feature of EPrints version 3.2 software (due 2009). Plugin applications that use the controller have been written for Amazon S3 and Sun cloud services among others, as well as for local disk storage. In a breakthrough application Preserv 2 used OAI-ORE to show how data can be moved between two repository softwares with quite distinct data models, from an EPrints repository to a Fedora repository. The largest area of work in Preserv 2 was on file format management and an 'active' preservation approach. This involves identifying file formats, assessing the risks posed by those formats and taking action to obviate the risks where that could be justified. These processes were implemented with reference to a technical registry, PRONOM from The National Archives (TNA), and DROID (digital record object identification service), also produced by TNA. Preserv 2 showed we can invoke a current registry to classify the digital objects and present a hierarchy of risk scores for a repository. Classification was performed using the Preserv2 EPrints preservation toolkit. This 'wraps' DROID in an EPrints repository environment. This toolkit will be another feature available for EPrints v3.2 software. The result of file format identification can indicate a file is at risk of becoming inaccessible or corrupted. Preserv 2 developed a repository interface to present formats by risk category. Providing risk scores through the live PRONOM service was shown to be feasible. Spin-off work is ongoing to develop format risk scores by compiling data from multiple sources in a new linked data registry.

The Johns Hopkins University Sheridan Libraries have been awarded $20 million from the National Science Foundation (NSF) to build a data research infrastructure for the management of the ever-increasing amounts of digital information created for teaching and research. The five-year award, announced this week, was one of two for what is being called "data curation."

The project, known as the Data Conservancy, involves individuals from several institutions, with Johns Hopkins University serving as the lead and Sayeed Choudhury, Hodson Director of the Digital Research and Curation Center and associate dean of university libraries, as the principal investigator. In addition, seven Johns Hopkins faculty members are associated with the Data Conservancy, including School of Arts and Sciences professors Alexander Szalay, Bruce Marsh, and Katalin Szlavecz; School of Engineering professors Randal Burns, Charles Meneveau, and Andreas Terzis; and School of Medicine professor Jef Boeke. The Hopkins-led project is part of a larger $100 million NSF effort to ensure preservation and curation of engineering and science data.

Beginning with the life, earth, and social sciences, project members will develop a framework to more fully understand data practices currently in use and arrive at a model for curation that allows ease of access both within and across disciplines.

"Data curation is not an end but a means," said Choudhury. "Science and engineering research and education are increasingly digital and data-intensive, which means that new management structures and technologies will be critical to accommodate the diversity, size, and complexity of current and future data sets and streams. Our ultimate goal is to support new ways of inquiry and learning. The potential for the sharing and application of data across disciplines is incredible. But it’s not enough to simply discover data; you need to be able to access it and be assured it will remain available."

The Data Conservancy grant represents one of the first awards related to the Institute of Data Intensive Engineering and Science (IDIES), a collaboration between the Krieger School of Arts and Sciences, the Whiting School of Engineering, and the Sheridan Libraries. . . .

In addition to the $20 million grant announced today, the Libraries received a $300,000 grant from NSF to study the feasibility of developing, operating and sustaining an open access repository of articles from NSF-sponsored research. Libraries staff will work with colleagues from the Council on Library and Information Resources (CLIR), and the University of Michigan Libraries to explore the potential for the development of a repository (or set of repositories) similar to PubMedCentral, the open-access repository that features articles from NIH-sponsored research. This grant for the feasibility study will allow Choudhury's group to evaluate how to integrate activities under the framework of the Data Conservancy and will result in a set of recommendations for NSF regarding an open access repository.

The survey task force recommends a number of actions to facilitate the time-critical process of rescuing IUB’s audio, video, and film media.

Appoint a campus-wide taskforce to advise

the development of priorities for preservation action

the development of a campus-wide preservation plan

how units can leverage resources for the future

Create a centralized media preservation and digitization center that will serve the entire campus, using international standards for preservation transfer. As part of the planning for this center, hire a

media preservation specialist

film archivist

Develop special funding for the massive and rapid digitization of the treasures of IU over the next 10 years.

Featured Digital Scholarship Publications

DigitalKoans Overview

DigitalKoans provides news and commentary on digital copyright, digital curation, digital repository, open access, research data management, scholarly communication, and other digital information issues. From April 2005 through March 2016, DigitalKoans had over 13.4 million visitors, over 60.5 million file requests, and over 45.3 million page views. Excluding spiders, there were over 8 million visitors and over 19.8 million page views. It is available via e-mail, RSS feed, and Twitter.