Relevance and Usefulness: Offers advice and identifies skills needed for librarians to participate successfully in the e-Science process.

+

+

Abstract: For liaison or subject librarians, entering into the emerging area of providing researchers with data services or partnering with them on cyberinfrastructure projects can be a daunting task. This article will provide some advice as to what to expect and how providing data services can be folded into other liaison duties. New skills for librarians and traditional skills that can be adapted to data curation work will also be discussed. A case study on the authors' experiences collaborating with two chemistry faculty on an e-science project serves as the framework for the majority of this article.

Relevance and Usefulness: Cautionary viewpoint, includes: “The truth is that the idea of becoming involved with the management, preservation, curation, and annotation of primary research data at the local level is daunting and particularly so in our current environment.” (2nd paragraph)

Relevance and Usefulness: Slides form workshop intended to get librarians started.

+

+

Abstract: The course materials from a full-day, continuing education workshop given for the Chicago Metropolitan Library System that covered an introduction to institutional repositories, a rationale for sharing and archiving research data, an introduction to data curation, and related roles, resources, and approaches for librarians. Participants engaged in lab activities where they submitted sample datasets to two popular data-sharing websites, investigated and evaluated research data collections, and completed a prospectus to relate concepts from the workshop to their local institutions for future discussion and planning. Included are the presentation slides, a data repository prospectus, a list of data repositories, and a selected bibliography of recommended reading that were created in support of the workshop.

Relevance and Usefulness: Provides a checklist for interviewing faculty about their datasets.

+

+

Abstract: Librarians at Purdue University are beginning to identify the scientific datasets that are being generated by our faculty and researchers as information assets to be collected, preserved, and made accessible as a function of the library’s collection development. These librarians are subject-area specialists, and many have advanced degrees in their respective disciplines in addition to a degree in library science. They have all been trained in collection management; however, much of this training was related to traditional formats such as monographs and serials and not datasets. In our experience, one of the most effective tactics for eliciting datasets for the collection is a simple librarian-researcher interview. In this poster, we share a set often questions that a librarian can use as a starting point for such a “data interview”. It is not a comprehensive strategy but instead a practical tool to draw out information that needs to be considered in order to evaluate the suitability of a dataset for the collection and the requirements for the infrastructure and services that will be needed for data curation. This poster was presented at the 3rd International Digital Curation Conference on December 12-13, 2007, in Washington D.C.

Relevance and Usefulness: Offers a model for establishing best practices at your library. Has many links and extensive bibliography.

+

+

Abstract: Advances in computational capacity and tools, coupled with the accelerating collection and accumulation of data in many disciplines, are giving rise to new modes of conducting research. Infrastructure to promote and support the curation of digital research data is not yet fully-developed in all research disciplines, scales, and contexts. Organizations of all kinds are examining and staking out their potential roles in the areas of cyberinfrastructure development, data-driven scholarship, and data curation. The purpose of the Cornell University Library's (CUL) Data Working Group (DaWG) is to exchange information about CUL activities related to data curation, to review and exchange information about developments and activities in data curation in general, and to consider and recommend strategic opportunities for CUL to engage in the area of data curation. This white paper aims to fulfill this last element of the DaWG's charge.

+

Notes: This report of the Data Working Group offers five broad recommendations for ways in which the Cornell University Library might engage in data curation and related activities.

+

1. Seek out and cultivate partnerships with other organizations.

+

2. Provide services to Cornell researchers in several areas.

+

3. Assess local needs and develop local infrastructure and related policies.

+

4. Cultivate a workforce capable of addressing the new challenges posed by data curation and cyberinfrastructure development.

+

5. Form a Data Curation Executive Group and reorganize the Data Working Group.

+

+

+

Henty, Margaret (The Australian National University), Weaver,Belinda (The University of Queensland), Bradbury, Stephanie (Queensland University of Technology), and Porter, Simon (The University of Melbourne). 2008. "Investigating data management practices in Australian universities." Australian Partnership for Sustainable Repositories (APSR), http://www.apsr.edu.au/orca/investigating_data_management.pdf; http://www.apsr.edu.au/investigating_data_management

+

+

Relevance and Usefulness: See survey questions and responses from researchers at a large university. How does your institution compare?

+

+

Abstract: This report presents all the findings of the Data Management Survey conducted at The University of Queensland, The University of Melbourne and Queensland University of Technology in late 2007. The report, based on the responses of 879 researchers, sheds light on all aspects of data management practice including the use of data management plans, storage and backup, data sharing, data ownership and much more. The report also shows how the three participating Universities have used the results of the surveys to progress their own provision of support for eResearch.

+

Summary: Findings include: The 879 responses from three institutions show similar patterns. Over 90% of respondents reported that their research generates digital data. Most store data in spreadsheets or databases, documents and reports, and data generated from computer programs. About one-third of respondents have less than 1GB of data and a similar proportion between 1GB and 1TB. The most popular software used for data storage and manipulation are SPSS and Excel. Most lack a formal data management plan. Most are willing to share their data in appropriate circumstances.

Relevance and Usefulness: Shows direction and goals for NSF; focused on 2006-2010.

+

A

+

bstract: NSF’s Cyberinfrastructure Vision for 21st Century Discovery is presented in a set of interrelated chapters that describe the various challenges and opportunities in the complementary areas that make up cyberinfrastructure: computing systems, data, information resources, networking, digitally enabled-sensors, instruments, virtual organizations, and observatories, along with an interoperable suite of software services and tools. This technology is complemented by the interdisciplinary teams of professionals that are responsible for its development, deployment and its use in transformative approaches to scientific and engineering discovery and learning. The vision also includes attention to the educational and workforce initiatives necessary for both the creation and effective use of cyberinfrastructure.

Abstract: A distributed infrastructure that would enable those who wish to do so to contribute their scientific or technical data to a universal digital commons could allow such data to be more readily preserved and accessible among disciplinary domains. Five critical issues that must be addressed in developing an efficient and effective data commons infrastructure are described. We conclude that creation of a distributed infrastructure meeting the critical criteria and deployable throughout the networked university library community is practically achievable.

Abstract: The digital revolution has transformed the accumulation of properly curated public research data into an essential upstream resource whose value increases with use. The potential contributions of such data to the creation of new knowledge and downstream economic and social goods can in many cases be multiplied exponentially when the data are made openly available on digital networks. Most developed countries spend large amounts of public resources on research and related scientific facilities and instruments that generate massive amounts of data. Yet precious little of that investment is devoted to promoting the value of the resulting data by preserving and making them broadly available. The largely ad hoc approach to managing such data, however, is now beginning to be understood as inadequate to meet the exigencies of the national and international research enterprise. The time has thus come for the research community to establish explicit responsibilities for these digital resources. This article reviews the opportunities and challenges to the global science system associated with establishing an open data policy.

Relevance and Usefulness: See examples and get ideas of do-able projects.

+

+

Abstract: SIMILE is a joint project conducted by the MIT Libraries and MIT CSAIL. Focuses on well-defined, real-world use cases in the libraries' domain. The Simile Data Collection project aims to collect a set of useful example RDF data sets that are generally useful for metadata research and tools community.

Abstract: New Career Opportunities: People with a data management skillset combined with experience and education in a scientific or technology-based discipline are in demand. Examples of organizations hiring such people are: corporate research centers and production lines, government research libraries and laboratories, universities.

Abstract: Data are becoming an essential product of scholarship, complementing the roles of journal articles, papers, and books. Research data can be reused to ask new questions, to replicate studies, and to verify research findings. Data become even more valuable when linked to publications and other related resources to form a value chain. Types and uses of data vary widely between disciplines, as do the online availability of publications and the incentives of scholars to publish their data. Publishers, scholars, and librarians each have roles to play in constructing a new scholarly information infrastructure for e-research. Technical, policy, and institutional components are maturing; the next steps are to integrate them into a coherent whole. Achieving a critical mass of datasets in public repositories, with links to and from publisher databases, is the most promising solution to maintaining and sustaining the scholarly record in digital form.

+

+

+

Carlson, Scott. 2008. How to channel the data deluge in academic research. ''Chronicle of Higher Education''. April 4, 2008, http://chronicle.com/weekly/v54/i30/30b02401.htm.

+

+

Relevance and Usefulness: Background, intended for general Chronicle audience.

+

+

Abstract: What are the best ways to organize the mass quantities of data that researchers generate, and to share those data to engender new research? Scott Carlson, a senior reporter at The Chronicle, asked Michael C. Witt, an assistant professor of library science and an interdisciplinary research librarian at Purdue University Libraries and its Distributed Data Curation Center, and Sayeed Choudhury, associate dean of university libraries and director of the Digital Knowledge Center at the Sheridan Libraries of the Johns Hopkins University, for their views.

Relevance and Usefulness: interview and background for a general audience.

+

+

Abstract: Barry Canton, a 28-year-old biological engineer at the Massachusetts Institute of Technology, has posted raw scientific data, his thesis proposal, and original research ideas on an online website for all to see.

Relevance and Usefulness: Describes the “challenges of data abundance” and the vital role of computing in the advancement of scientific understanding.

+

+

Abstract: All sciences, including astronomy, are now entering the era of information abundance. The exponentially increasing volume and complexity of modern data sets promises to transform the scientific practice, but also poses a number of common technological challenges. The Virtual Observatory concept is the astronomical community's response to these challenges: it aims to harness the progress in information technology in the service of astronomy, and at the same time provide a valuable testbed for information technology and applied computer science. Challenges broadly fall into two categories: data handling (or "data farming"), including issues such as archives, intelligent storage, databases, interoperability, fast networks, etc., and data mining, data understanding, and knowledge discovery, which include issues such as automated clustering and classification, multivariate correlation searches, pattern recognition, visualization in highly hyperdimensional parameter spaces, etc., as well as various applications of machine learning in these contexts. Such techniques are forming a methodological foundation for science with massive and complex data sets in general, and are likely to have a much broader impact on the modern society, commerce, information economy, security, etc. There is a powerful emerging synergy between the computationally enabled science and the science-driven computing, which will drive the progress in science, scholarship, and many other venues in the 21st century.

Latest revision as of 10:10, 28 December 2010

ACRL Science & Technology Section (STS) 2009 Program:

Big Science, Little Science, E-Science: the science librarian’s role in the conversation.

Relevance and Usefulness: Offers advice and identifies skills needed for librarians to participate successfully in the e-Science process.

Abstract: For liaison or subject librarians, entering into the emerging area of providing researchers with data services or partnering with them on cyberinfrastructure projects can be a daunting task. This article will provide some advice as to what to expect and how providing data services can be folded into other liaison duties. New skills for librarians and traditional skills that can be adapted to data curation work will also be discussed. A case study on the authors' experiences collaborating with two chemistry faculty on an e-science project serves as the framework for the majority of this article.

Relevance and Usefulness: Cautionary viewpoint, includes: “The truth is that the idea of becoming involved with the management, preservation, curation, and annotation of primary research data at the local level is daunting and particularly so in our current environment.” (2nd paragraph)

Relevance and Usefulness: Slides form workshop intended to get librarians started.

Abstract: The course materials from a full-day, continuing education workshop given for the Chicago Metropolitan Library System that covered an introduction to institutional repositories, a rationale for sharing and archiving research data, an introduction to data curation, and related roles, resources, and approaches for librarians. Participants engaged in lab activities where they submitted sample datasets to two popular data-sharing websites, investigated and evaluated research data collections, and completed a prospectus to relate concepts from the workshop to their local institutions for future discussion and planning. Included are the presentation slides, a data repository prospectus, a list of data repositories, and a selected bibliography of recommended reading that were created in support of the workshop.

Relevance and Usefulness: Provides a checklist for interviewing faculty about their datasets.

Abstract: Librarians at Purdue University are beginning to identify the scientific datasets that are being generated by our faculty and researchers as information assets to be collected, preserved, and made accessible as a function of the library’s collection development. These librarians are subject-area specialists, and many have advanced degrees in their respective disciplines in addition to a degree in library science. They have all been trained in collection management; however, much of this training was related to traditional formats such as monographs and serials and not datasets. In our experience, one of the most effective tactics for eliciting datasets for the collection is a simple librarian-researcher interview. In this poster, we share a set often questions that a librarian can use as a starting point for such a “data interview”. It is not a comprehensive strategy but instead a practical tool to draw out information that needs to be considered in order to evaluate the suitability of a dataset for the collection and the requirements for the infrastructure and services that will be needed for data curation. This poster was presented at the 3rd International Digital Curation Conference on December 12-13, 2007, in Washington D.C.

Relevance and Usefulness: Offers a model for establishing best practices at your library. Has many links and extensive bibliography.

Abstract: Advances in computational capacity and tools, coupled with the accelerating collection and accumulation of data in many disciplines, are giving rise to new modes of conducting research. Infrastructure to promote and support the curation of digital research data is not yet fully-developed in all research disciplines, scales, and contexts. Organizations of all kinds are examining and staking out their potential roles in the areas of cyberinfrastructure development, data-driven scholarship, and data curation. The purpose of the Cornell University Library's (CUL) Data Working Group (DaWG) is to exchange information about CUL activities related to data curation, to review and exchange information about developments and activities in data curation in general, and to consider and recommend strategic opportunities for CUL to engage in the area of data curation. This white paper aims to fulfill this last element of the DaWG's charge.
Notes: This report of the Data Working Group offers five broad recommendations for ways in which the Cornell University Library might engage in data curation and related activities.
1. Seek out and cultivate partnerships with other organizations.
2. Provide services to Cornell researchers in several areas.
3. Assess local needs and develop local infrastructure and related policies.
4. Cultivate a workforce capable of addressing the new challenges posed by data curation and cyberinfrastructure development.
5. Form a Data Curation Executive Group and reorganize the Data Working Group.

Relevance and Usefulness: See survey questions and responses from researchers at a large university. How does your institution compare?

Abstract: This report presents all the findings of the Data Management Survey conducted at The University of Queensland, The University of Melbourne and Queensland University of Technology in late 2007. The report, based on the responses of 879 researchers, sheds light on all aspects of data management practice including the use of data management plans, storage and backup, data sharing, data ownership and much more. The report also shows how the three participating Universities have used the results of the surveys to progress their own provision of support for eResearch.
Summary: Findings include: The 879 responses from three institutions show similar patterns. Over 90% of respondents reported that their research generates digital data. Most store data in spreadsheets or databases, documents and reports, and data generated from computer programs. About one-third of respondents have less than 1GB of data and a similar proportion between 1GB and 1TB. The most popular software used for data storage and manipulation are SPSS and Excel. Most lack a formal data management plan. Most are willing to share their data in appropriate circumstances.

Relevance and Usefulness: Intended to help science librarians discuss issues with their deans and directors, this document provides definitions and links to further reading.

Association of Research Libraries, and Coalition for Networked Information.
“Reinventing science librarianship: Models for the future,” October 2008. Proceedings from the ARL/CNI Fall Forum. http://www.arl.org/resources/pubs/fallforumproceedings/forum08proceedings.shtml
Description: Audio clips, slides, and some text from Forum sessions. Presenters include many of the “big” names in data curation.

Relevance and Usefulness: Shows direction and goals for NSF; focused on 2006-2010.
A
bstract: NSF’s Cyberinfrastructure Vision for 21st Century Discovery is presented in a set of interrelated chapters that describe the various challenges and opportunities in the complementary areas that make up cyberinfrastructure: computing systems, data, information resources, networking, digitally enabled-sensors, instruments, virtual organizations, and observatories, along with an interoperable suite of software services and tools. This technology is complemented by the interdisciplinary teams of professionals that are responsible for its development, deployment and its use in transformative approaches to scientific and engineering discovery and learning. The vision also includes attention to the educational and workforce initiatives necessary for both the creation and effective use of cyberinfrastructure.

Abstract: A distributed infrastructure that would enable those who wish to do so to contribute their scientific or technical data to a universal digital commons could allow such data to be more readily preserved and accessible among disciplinary domains. Five critical issues that must be addressed in developing an efficient and effective data commons infrastructure are described. We conclude that creation of a distributed infrastructure meeting the critical criteria and deployable throughout the networked university library community is practically achievable.

Abstract: The digital revolution has transformed the accumulation of properly curated public research data into an essential upstream resource whose value increases with use. The potential contributions of such data to the creation of new knowledge and downstream economic and social goods can in many cases be multiplied exponentially when the data are made openly available on digital networks. Most developed countries spend large amounts of public resources on research and related scientific facilities and instruments that generate massive amounts of data. Yet precious little of that investment is devoted to promoting the value of the resulting data by preserving and making them broadly available. The largely ad hoc approach to managing such data, however, is now beginning to be understood as inadequate to meet the exigencies of the national and international research enterprise. The time has thus come for the research community to establish explicit responsibilities for these digital resources. This article reviews the opportunities and challenges to the global science system associated with establishing an open data policy.

Projects and ideas

Relevance and Usefulness: See examples and get ideas of do-able projects.

Abstract: SIMILE is a joint project conducted by the MIT Libraries and MIT CSAIL. Focuses on well-defined, real-world use cases in the libraries' domain. The Simile Data Collection project aims to collect a set of useful example RDF data sets that are generally useful for metadata research and tools community.

Abstract: New Career Opportunities: People with a data management skillset combined with experience and education in a scientific or technology-based discipline are in demand. Examples of organizations hiring such people are: corporate research centers and production lines, government research libraries and laboratories, universities.

Not for librarians only

Abstract: Data are becoming an essential product of scholarship, complementing the roles of journal articles, papers, and books. Research data can be reused to ask new questions, to replicate studies, and to verify research findings. Data become even more valuable when linked to publications and other related resources to form a value chain. Types and uses of data vary widely between disciplines, as do the online availability of publications and the incentives of scholars to publish their data. Publishers, scholars, and librarians each have roles to play in constructing a new scholarly information infrastructure for e-research. Technical, policy, and institutional components are maturing; the next steps are to integrate them into a coherent whole. Achieving a critical mass of datasets in public repositories, with links to and from publisher databases, is the most promising solution to maintaining and sustaining the scholarly record in digital form.

Relevance and Usefulness: Background, intended for general Chronicle audience.

Abstract: What are the best ways to organize the mass quantities of data that researchers generate, and to share those data to engender new research? Scott Carlson, a senior reporter at The Chronicle, asked Michael C. Witt, an assistant professor of library science and an interdisciplinary research librarian at Purdue University Libraries and its Distributed Data Curation Center, and Sayeed Choudhury, associate dean of university libraries and director of the Digital Knowledge Center at the Sheridan Libraries of the Johns Hopkins University, for their views.

Relevance and Usefulness: interview and background for a general audience.

Abstract: Barry Canton, a 28-year-old biological engineer at the Massachusetts Institute of Technology, has posted raw scientific data, his thesis proposal, and original research ideas on an online website for all to see.

Relevance and Usefulness: Describes the “challenges of data abundance” and the vital role of computing in the advancement of scientific understanding.

Abstract: All sciences, including astronomy, are now entering the era of information abundance. The exponentially increasing volume and complexity of modern data sets promises to transform the scientific practice, but also poses a number of common technological challenges. The Virtual Observatory concept is the astronomical community's response to these challenges: it aims to harness the progress in information technology in the service of astronomy, and at the same time provide a valuable testbed for information technology and applied computer science. Challenges broadly fall into two categories: data handling (or "data farming"), including issues such as archives, intelligent storage, databases, interoperability, fast networks, etc., and data mining, data understanding, and knowledge discovery, which include issues such as automated clustering and classification, multivariate correlation searches, pattern recognition, visualization in highly hyperdimensional parameter spaces, etc., as well as various applications of machine learning in these contexts. Such techniques are forming a methodological foundation for science with massive and complex data sets in general, and are likely to have a much broader impact on the modern society, commerce, information economy, security, etc. There is a powerful emerging synergy between the computationally enabled science and the science-driven computing, which will drive the progress in science, scholarship, and many other venues in the 21st century.