OpenRefine – OpenRefine (ex-Google Refine) is a powerful tool for working with messy data, cleaning it, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase.

ORCID – An open community-based effort to create and maintain a registry of unique researcher identifiers and a transparent method of linking research activities and outputs to these identifiers.

Data Curation Publications

Data Curation, SPEC Kit 354 (2017) – The Association of Research Libraries' SPEC Kit "explores the infrastructure that ARL member institutions are using for data curation, which data curation services are offered, who may use them, which disciplines demand services most, library staffing levels, policies and workflows, and the challenges of supporting these activities. It includes examples of data repository web pages, descriptions of services, infrastructure, workflows, metadata schemas, and policies, and job descriptions."

Research Data Curation Bibliography: Version 2 (2013) – This selective bibliography includes over 200 English-language articles and technical reports that are useful in understanding the curation of digital research data in academic and other research institutions.

Curating for Quality: Ensuring Data Quality to Enable New Science (2012) – Final report from NSF sponsored workshop focusing on defining data quality issues and possible solutions. Includes outline of key points raised in the workshop and position papers submitted by participants. Key challenges outlined in the report include data selection strategies, understanding what context to include in data curation, tools and techniques that support painless data curation across disciplines, and cost models.

Managing Research Data (2012) – Written for librarians, this book covers a wide variety of topics related to managing research data, such as why manage research data, explanation of the research data lifecycle, data management planning, and roles librarians can play to serve their faculty. The book is a compilation by many authors, all authorities in managing research data, and represent US, UK, and Australian academic and research institutions.

Communicating Scientific Data from the Present to the Future – Position paper from Princeton’s 2011 Research Data Lifecycle Management workshop advocates use of HDF5, Hierarchical Data Format Version, a generic scientific data format with supporting software, for long-term preservation of heterogeneous research data.

Data Curation: An ecological perspective by Sayeed Choudhury (2010) – College & Research Library News. – Sayeed Choudhury draws inspiration from the natural world to illustrate the need for different library communities to contribute to an overall data curation network.

Learning by Doing: Cases of Librarians Working with Faculty Research Data for the First Time (2010) – Purdue librarians conducted an exercise to learn about data curation in practical terms by identifying and engaging potential data contributors on campus. Subject specialist librarians engaged with six data creators from different disciplines to obtain data set contributions. The librarians reported descriptions of the data, the rationale for its selection and narratives of how they engaged with the data creators and questions and insights that emerged from these interactions.