Continuing the Conversation: New Models of Metadata

Submitted by Daniel A. Freeman on October 27, 2010 - 7:52am

Earlier today, we held the first session of our three-part workshop, Using RDA: Moving into the Metadata Future. The session, New Models of Metadata with Karen Coyle, was a huge success--there were so many questions that we didn't have time to answer or resolve the conversation on all of them. The following are some questions for continued discussion. Karen Coyle will be chiming in via comments--please join her.

Are see online catalogs and/or metasearch "discovery" systems morphing along with semantic web, or will there be a big migration from online catalogs to something else entirely different?

The transition to RDA and the Semantic Web is coming at a time when institutions are looking to cut costs, with cataloging being a great expense. This all sounds nice, but like it will put a great strain on already overburdened catalogers. Knowing all these relationships, adding in that data, yet trying to do this with less staff. How will this work? It may be considered necessary for libraries to compete, but how will it be affordable?

Is cataloging a creative profession? If so, how?

Why is RDA being created? If MARC wasn't broken, why are we fixing it? Did users ask for it?

Since we don't yet have catalogs that make use of linked data, and we don't yet know how linked data will operate within general web interfaces, this is an open question. What I can say, though, is that complex data does not have to result in complex user views, and that the applications that get people to what they need/seek/want with the fewest clicks will win.

Already users can set their location when searching WorldCat so that their local libraries are the ones that appear when a search is done. I see no reason why that concept shouldn't continue to be used in a linked data environment. I can stand on a street corner in any US city with a smart phone and find the nearest Chinese restaurant, a bus stop, or the local library. Getting people quickly to physical or virtual places is already a reality.

I enjoyed your presentation very much. I have a quick question: When a patron does a search and reaches the RDA enhanced model of the record and is faced with all the hyperlinks, how far will they need to dig to find which branch library has the book?

The data that has been created in the MARC format is not broken, but we have reached some actual limits of the MARC record's structure. That structure was revolutionary in 1965 when it was developed, but computing in those days was so different from today that it would be hard to even make a comparison.

On a basic level, the standards committee overseeing MARC has been struggling for years with the limitation of 26 possible content subfields per field (a-z). There are actually many fields that have reached or nearly reached that limit and therefore cannot be expanded as new needs arise. For example, many fields could not contain both the display forms of names or terms as well as the identifiers that represent them.

The MARC record is limited to 99,999 characters and each field to 9999 characters. Libraries adding content like tables of contents, summaries, reviews, etc. run up against these limits.

There are other problems that are less easy to explain in a blog comment, but let's just say that as MARC has evolved over its 40-year lifetime it has some inconsistencies have been introduced into the data representation that would best be cleared up.

As I mentioned during the webinar, it is unusual for data creators in a community to work directly with data fields rather than with an interface that hides the data structure from them. The data creators in the library world think in terms of MARC fields rather than, for example, AACR2 rules. The MARC field names are handy codes for complex concepts. Any future interface will need to have a similar facility that allows data creators to both visualize and communicate about the data. I would love to see some experimentation along these lines, perhaps with RDA.

One chatter suggested a webinar session on FRBR -- which I think is an excellent idea. Meanwhile, there are some readings that you might find helpful. A short pamphlet on FRBR was written by Barbara Tillet of Library of Congress:

This is a question for which there isn't an answer at this time. The linked data community often refers to itself and its data as "LOD" -- linked open data. That is in part philosophical and in part practical, because there is no way at this time to control usage of data that has been placed online. There is a W3C working group looking into a way to include "provenance" with linked data. Knowing who created the data (at a data element level) is mainly of interest for knowing the authoritativeness of your data, but it could also inform rights issues.

When the data is "external" or "out there", instead of in our "dark web" databases, who owns it? Who owns the records? Can OCLC still say that they own the recombinant data, rather than static, monolithic records?

Those are three different "things" - rather like apples, oranges and avocados. But they do get used in about the same breath, so I can see where they get confounded, as it were.

FRBR is a mental model of the bibliographic universe. It is very abstract, but it helps us think about how we define our area of operation. It defines bibliographic data as being made up of three basic categories of things: bibliographic things, agents that act on those things, and subjects of those things.

Dublin Core is a set of metadata that was developed as a simple way to describe web documents. It has a small number of fields for description, thus the "core." It is used heavily on the web, and used in some library applications outside of the MARC environment, such as for digital repositories where full cataloging is not possible.http://dublincore.org/documents/dcmi-terms/

RDA is "Resource Description and Access" and is a set of cataloging rules intended to be the successor to AACR2. RDA follows the FRBR structure, so those two standards are logically entwined.http://www.rda-jsc.org/rda.html

I think that the next two chapters of this webinar series will make all of this more clear.