Coverage of the annual conference of the Massachusetts Library Association

Thursday, May 7, 2009

"Next Generation" Cataloging and Metadata Creation Pilot

Maureen Huss and Renee Register are here to discuss the need to create new models for metadata creation. The traditional model has been a collaborative model in which professional catalogers create records and they trickle up to the OCLC database. But the reduction in professional catalogers, the increase in system generated records, and the prevalence of web based data has challenged this model.

The metadata chain begins with that produced by publishers. This prepublication information then influences the purchasing decisions of libraries, retailers, and consumers. Publisher's have their own systems for creating this data, that are not always compatible with those of the library world. This metadata evolves over time as each item comes closer to publication.

This information is then shared with various wholesalers (Ingram, B&T), retailers (Amazon) and aggregators (Bowker), who add their own data and their own systems of managing that information. These vendors then build their services around this data, often trying to find ways to create added value. By the time an item reaches the post-publication phase information for item can come from a multitude of sources. Amazon uses an algorithm that cherry pick from amongst these sources to produce their own records. Vendors have a very strong need for proper metadata, if errors occur then their customers will be unable to find items.

ONIX is the international standard for metadata management within the book industry. Now any standard is only as good as the way it is used (yes this applies to MARC). The book industry has created a series of best practices documents in an effort to maintain an ONIX standard.

Libraries need data in MARC format, and thus many library wholesalers maintain a separate database in order to cater to their customers. Their services are usually proprietary, and thus OCLC has attempted to open up Worldcat to vendor records, but it is unlikely that they will ever capture everything.

MARC exists outside of the metadata stream of publishers. MARC records also tend to be far more static than publisher information, which can change with each printing and can be added too easier (i.e. in cases of books winning awards).

Renee and Maureen's project is an attempt to streamline the data creation process and allow ONIX data to be interopperable with MARC data. Both formats can benefit from one another. Similarly the project seeks to establish links between BISAC and DDC, but not necessarily through a straight mapping.

Publisher data often includes useful data such as awards won, publisher links, and biographical data that have typically been considered extraneous to MARC records. However, MARC can benefit greatly from pulling that data into its environment.

A straight ONIX to MARC mapping can create a basic record. Many books are simply republications of prior works, and OCLC hopes to use FRBR work information to identify information from older editions that can then be reused.

OCLC's DDC/BISAC terminology service can double as a authority service for those standards.

OCLC has partnered with:

Phoenix Public Library

Ohio State University Libraries

Chicago Public Library

MIT Libraries

Ingram Book Group

Princeton University Press

Hachette Book Group

Taylor & Francis

The pilot project began in January, 2008 and will wrap up this Spring. In addition to the pilot OCLC has processed a test file for a major publisher, has processed over 3 million records for a major retailer, processed a sample file for LC CIP and commissioned a study on the metadata life cycle. The results of the study will be available by the end of May.

In March OCLC hosted a publisher and librarian symposium for the purpose of discussing their joint metadata needs. They hope to continue hosting this symposium on an annual basis.