from OCLC Research

The Core of Bibliographic Description

In trying to make our metadata work harder, as we have with WorldCat Identities, WorldCat Genres, and other such projects, OCLC Research does a lot of looking at what our collective metadata holds. And frankly, in some ways I think it is less than you might think.

For example, my colleague Karen Smith-Yoshimura produced a while back, as part of the work she was leading to “gather evidence to inform changes in MARC metadata practices”, a scatterplot of the number of times various MARC elements appear in WorldCat records. The vast majority of record elements fell to the bottom of the chart at a very low occurrence rate. But as is the case with any scatterplot, the point is to identify the outliers. The outliers in this case are those elements that appear in a large number of records — that is, what might be considered “core” elements that are used to describe the vast majority of library owned material.

Those “outliers” can be categorized according to three general purposes:

That’s it. In a nutshell you have the very core of bibliographic description as defined by librarians over the last century or so. Are all other MARC elements useless? That’s not necessarily what I’m suggesting, although I do believe it calls into question the utility of a number of MARC elements. What I’m really trying to say is that if you want to know what librarians feel is useful or important in bibliographic description for the vast majority of library owned content, you only have to look at the evidence. It’s no more and no less than what is described above.

Post navigation

6 Comments

Very interesting, although not very surprising. (But I always like it when hard evidence echoes my experience!)

An additional thought on this. Cataloguers never work without a carrier for their bibliographic descriptions: First there was the space of the card and any filing system that determined what should and shouldn’t be recorded. Then came computers, electronic formats and databases which offered more flexibility but also required more and/or more detailed input. But unless those details would be useful for end users via OPACS I suspect that many were simply ignored. So the above core elements might rather reflect what was practical with the systems (either for input and output) available until recently. Therefore, decisions about future cataloguing rules and/or formats should by all means look at practicalities but not necessarily restricted by those from the past… if that makes sense.

Esther – you make an excellent point that could get easily lost in what I described here. Past practice should not necessarily limit our future vision, and the systems that we had at hand could have certainly hog-tied us in some ways.

I think this is particularly true when we inspect _how_ these data elements were recorded. I think a particularly egregious example (admittedly, in hindsight, and with all the best computer technologies now at our fingertips) is something like the collation statement. Although there are a number of discrete metadata elements there, they are delineated only by ISBD punctuation, which is darned difficult to parse reliably via software.

One of our most difficult transitions, which we have to make, must surely be transitioning from ISBD punctuation to actual machine-readable cataloging. I would say if there was one mistake made long, long ago, it was to fail to grasp the importance of granularity in markup and the ability of software to produce punctuation on-the-fly. If those two things had been fully grasped when MARC was born we would be in a very different place.

But that is all hindsight, which is both cheap and easy. I don’t wish to lay blame, instead I want to look to the way forward, and I will pledge myself to that task, along with many others of the same mind. From your insights, it appears that you are a fellow traveler, and welcome.

Interesting that ‘elements essential for access’ such as hyperlinks, and data on rights and restrictions doesn’t feature within the three general purposes. Could this be because the descriptions are predominently of printed materials, rather than of content that may require technology to render it usable?