metadata – California Digital Libraryhttps://www.cdlib.org/cdlinfo
The Official CDL BlogThu, 08 Feb 2018 20:43:45 +0000en-UShourly1https://wordpress.org/?v=4.9.4EZID introduces a batch download featurehttps://www.cdlib.org/cdlinfo/2015/02/17/ezid-introduces-a-batch-download-feature/
Tue, 17 Feb 2015 15:08:38 +0000http://www.cdlib.org/cdlinfo/?p=16787More...]]>EZID has just implemented a feature allowing clients to download a copy of all or some of their identifiers and descriptive information (i.e. metadata). We call it “batch” download, because EZID returns all requested identifiers in one large file. Clients can actually download everything they own or just a subset, such as only their DOIs or only their test identifiers.

Unlike the OAI-PMH interface (described here) that only returns public identifiers, batch download is aimed at clients retrieving all identifiers and metadata that they own. To use batch download, you must have a valid EZID login account. With the account, you can request anything that you own:

real and test identifiers

public and private identifiers

identifiers with or without metadata

identifiers with or without an external-facing target URL

DOIs and ARKs

As a reminder, the term “target URL” means the digital location of the resource being registered with an identifier. DOIs are digital object identifiers, commonly used in scholarly communication. ARKs are archival research keys, in use by a wide range of researchers, museums and libraries. EZID clients have registered roughly twice as many ARKs as DOIs, for a total of just over 3 million identifiers.

For any client, the number of identifiers such an inquiry may retrieve will depend on number of identifiers they own, as well as the specific request submitted. Some clients have registered very few identifiers, and others have created many thousands. The download will retrieve a file in one of several formats suitable for analysis in common tools, such as CSV (comma separated value), XML (extensible markup language), and ANVL (a name-value language). We believe batch download will enable new opportunities for clients to manage their identifiers and metadata.

only “public” identifiers–the owners have given permission for harvesting/indexing,

only identifiers with target URLs pointing to locations outside of EZID,

and only identifiers with at least a minimal set of descriptive information, or “metadata.”

As a reminder, the term “target URL” means the digital location of the resource being registered with an identifier.

At the present time, the number of eligible identifiers exceeds 1.5 million, split about evenly between both DOIs and ARKs. DOIs are digital object identifiers, commonly used in scholarly communication. EZID’s DOIs are also available for indexing via the DataCite OAI-PMH service. ARKs are archival research keys, in use by a wide range of researchers, museums and libraries; the new EZID OAI-PMH interface represents the first public interface to these identifiers.

For EZID clients, the new interface means greater opportunities for their content to be seen. Every time someone clicks on an EZID ARK or DOI, the target URL gets a “hit.” For information aggregators, the new interface opens up a treasure trove of new information to explore.

]]>Announcing DataCite Metadata Version 3.0https://www.cdlib.org/cdlinfo/2013/07/25/announcing-datacite-metadata-version-3-0/
Thu, 25 Jul 2013 15:39:03 +0000http://www.cdlib.org/cdlinfo/?p=14033More...]]>Scholarly research is producing more digital research data, and scholarly communication depends on data to verify findings, create new research, and share outcomes. Until recently, a persistent approach to access, identification, sharing, and re-use of datasets has been missing. DataCite, an international consortium of research libraries, national data centers and national libraries was founded to meet this need. California Digital Library (CDL) is a founding member of DataCite, and CDL’s EZID service provides DataCite DOIs, as well as other identifiers.

DataCite has announced the release of Version 3.0 of the Metadata Schema. Documentation for the new schema is available at http://schema.datacite.org/. The DataCite Metadata Store (MDS) will accept Metadata Version 3.0 immediately. The MDS will continue to accept submissions using the prior versions of the Schema for the foreseeable future.

Key new features of Version 3.0 include:

Better support for depiction of dates by implementing the RKMS-ISO8601 standard for date ranges.

New support for recording data collection location, with box and point coordinates, as well by using a free-text description.

Provision of a mechanism to associate additional metadata so that discipline-specific descriptions can be added to DataCite’s more generic schema.

and, indicators of which optional properties are most important for helping the metadata to be found, cited and linked to original research.

A full list of the changes can be found starting on page 4 of the documentation.

For EZID clients who use the application programming interface (API), these changes can be taken advantage of immediately. For information about how to submit a DataCite XML file, see http://n2t.net/ezid/doc/apidoc.html#profile-datacite. For EZID clients who use the user interface (UI), we will be introducing a new DataCite XML entry form in the very near future conforming to the new V 3.0 schema.

For any questions or comments, please feel free to contact the EZID Team.

Several early adopters and other careful readers generously provided us with feedback regarding the details of the specification. As a result, we were able to make a number of improvements. The most significant change to the schema is that it now includes a namespace, which provides OAI PMH compatibility.

The documentation changes may be less significant, but we hope they add clarity. A new column in the properties tables provides guidance as to whether the property being described is an attribute or a child of the corresponding property that has preceded it. In addition, in response to a request, we gave one of the allowed values lists (the relationType pairs) a thorough overhaul.

I’d like to add my personal thanks to the Metadata Working Group members who helped review these changes, to the technical experts from our member institutions who provided advice, and to our Metadata Coordinator, Frauke Ziedorn, for everything she does, including keeping track of the feedback we get from community members.

On another note, a small team from the Metadata Working Group will begin working in April on a second version of the schema that is interoperable with the Dublin Core. Please stay tuned for more information on this development as it unfolds.

Are you a developer who is passionate about digital curation, linked open data, and open-source projects?

Would you like to work on a project which contributes to the international digital curation, preservation and repository communities?

Do you want to work in a innovative, collaborative, academic-based environment?

The CDL is seeking a developer for the Unified Digital Format Registry (UDFR, http://www.udfr.org).

The UDFR project is developing a reliable, sustainable, and publicly available semantic knowledge base of file format representation information.

Stakeholders for this project are drawn from academic and national libraries and archives around the world, including the University of California, Harvard University, the Florida Center for Library Automation, the Library of Congress, Library and Archives Canada, the British Library, the UK and US National Archives, the Koniklijke Bibliotheek and Nationaal Archief of the Netherlands, and many others. The project is funded by the Library of Congress as part of its National Digital Information Infrastructure Preservation Program.

If you or someone you know is interested in applying, please apply through this link:

]]>DataCite Metadata Scheme is publishedhttps://www.cdlib.org/cdlinfo/2011/01/24/datacite-metadata-scheme-is-published/
Mon, 24 Jan 2011 14:44:30 +0000http://www.cdlib.org/cdlinfo/?p=9195More...]]>The DataCite Metadata Scheme has been finalized and is now available here.

After many months and a lot of very early morning conference calls with my European colleagues, I am delighted to make this announcement. The core group that worked on this second iteration of the scheme came from:

British Library

California Digital Library

CISTI (Canada Institute for Scientific and Technical Information)

DTU Library (Technical Information Center of Denmark)

ETH Zurich (Swiss Federal Institute of Technology Zurich)

GESIS (Leibniz Institute for the Social Sciences, Germany)

TIB (German National Library of Science and Technology)

TU-Delft (Delft University of Technology)

Other members were involved in an advisory capacity as well. And, this iteration of the scheme also benefited greatly from the many helpful comments offered during the community review period we conducted in the late summer and early fall of 2010.

There are several key features to the metadata scheme, and my colleague Angela Gastl and I discuss these thoroughly in an article in the recent DLIB issue on research data. Briefly, these include a small mandatory set limited to those properties required for a data citation, as well as a carefully selected optional set that allows for the description of data and other resource relationships as desired.

The mandatory set is:

Identifier

Creator

Title

Publisher

PublicationYear

It is also notable that the DataCite organization is committed to supporting the scheme in a way that makes both very useful to DataCite’s own members and also available to the broader community.

On the “useful to DataCite’s own members” thread, I’ll say from California Digital Library’s perspective that we are very glad that the scheme is now finalized. As some readers know, our DataCite application is EZID. Now, we will be able to update our local application to the DataCite standard. Look for increasing functionality and services over time.

EZID (with DataCite inside) is one of the key tools you need to take control of the management and distribution of your research, share and get credit for it, and build your reputation through its collection and documentation. Read more about EZID here, or contact us.

]]>Persistencehttps://www.cdlib.org/cdlinfo/2010/08/30/persistence/
Mon, 30 Aug 2010 15:39:38 +0000http://www.cdlib.org/cdlinfo/?p=7022More...]]>Since March, I’ve had the opportunity of working with the UC3 team at CDL. I joined part time, you might say, as the project manager for the EZID/DataCite project. (I wear a lot of hats here.) In this role, I’m thinking in a new way about persistence. To begin with, the concept of EZID is to provide researchers (and others) a way to obtain persistent identifiers for their digital objects. As I prepare the website for the launch of our user interface, I’ve been composing an explanation of just what persistent means.

Image courtesy of University of the Pacific Library Holt-Atherton Special Collections

For an identifier to persist, it has to continue to identify the object, to be linked to the object, in a way that will not change if the object is moved or renamed. Persistent IDs mean never having to show a nasty 404 error message again.

The way that EZID approaches this problem is twofold: a) by offering long-term identifiers, and b) by providing a mechanism for the owner of the identifier to update the metadata associated with it. In other words, if you move your stuff to a new storage location, and if you update its address, then the linkage you established persists.

The way I think about persistence, though, extends further than identifiers. This may be a little unorthodox, but it seems to me that the notion of persistence, of continuing steadily in some state or direction, extends to organizational work practices and to institutions.

We certainly ask and respond to questions these days about the health of organizations and institutions, given the poor state of this “recovery” in general and all the bad news about California’s economy, in particular. So how can you tell if a prospective collaboration partner or service provider is likely to be around this time next year, never mind five years down the road? How do you avoid getting a nasty “vendor not found” message?

The answer, of course, is that there are no absolute guarantees. But I think there are indicators. Organizations that follow good practices, such as business continuity planning and establishment of service level agreements inspire a level of some confidence. An additional level of protection can come from what the Data Portability Project folks call a “data portability policy.” This is the idea (in part) of making transparent what data users bring in to a site (or application) and what they can get out.

Somehow, the admission that failure may occur, followed by the laying of groundwork for graceful exit, inspires the ability to continuing steadily in some state or direction, to persist. Do you agree?

]]>Strength in diversity: notes from the DataCite Conferencehttps://www.cdlib.org/cdlinfo/2010/06/21/strength-in-diversity/
https://www.cdlib.org/cdlinfo/2010/06/21/strength-in-diversity/#commentsMon, 21 Jun 2010 16:24:32 +0000http://www.cdlib.org/cdlinfo/?p=5558More...]]>In his groundbreaking book Guns, Germs and Steel, Jared Diamond argued that Europe’s key advantage over China during the Age of Exploration was the sheer number of European political entities. Christopher Columbus heard “No” from one sovereign and still had another from whom to seek patronage and sponsorship. According to Diamond’s logic, China’s successful unification was a disadvantage when it came to invention, adventure, and spreading horizons.

I saw this for myself when I sat down at a table in Hannover, Germany, for the first summer meeting of DataCite. I was leading a discussion of the Working Group on Metadata. Included in the discussions with me were representatives from these organizations:

ETH Zürich (Swiss Federal Institute of Technology)

GESIS (German Social Science Infrastructure Services)

DTU (Technical University of Denmark)

British Library

Purdue University Libraries

TIB (German National Library of Science and Technology)/DataCite

ANDS (Australian National Data Service)

CISTI (Canada Institute for Science and Technical Information)

SNDS (Swedish National Data Service)

Before I left home, I had received some advice from a veteran of metadata standards work, John Kunze. He wrote, “In my extensive experience with metadata standardization, the biggest threat to that process in our community (not the private sector) is non-convergent discussion. One approach to use…is the “desert island” question: if you know you’d be stranded by yourself on an island for five years and you could only bring 7 books with you, which would you bring?”

Up to this point, the metadata group had met virtually a few times, which is a greater than usual challenge with members spread from Europe all the way to Australia. So our face-to-face time was especially valuable. I took John’s advice and proposed that we focus our attention and efforts on achieving consensus on a core set of required elements. My colleagues readily agreed to this strategy.

We worked for 3 hours and, in the end, settled on 6 required elements. We also achieved a greater understanding of the differences between our various organizations, and that became apparent as we made the case for one metadata element or another. This makes for a better end product, because a standard that can accommodate a wide range of use cases and users is more successful than one that is more narrowly defined. When our discussions bumped into the edges of disagreement, we were able to uncover assumptions, clearing up misconceptions.

The Working Group on Metadata has more work to do, of course. We still have the optional elements to discuss. We must coordinate our work with other standards groups. And, now, we are back to functioning on a virtual basis. But, I think that we head into these remaining tasks with new strength both from our modest successes, and also from the experience of overcoming differences to achieve them.