Legacy technology of library catalogs affects us today. What is the term “enter under” in AACR2 if not a throw-back to the book catalog where indeed titles were literally entered under the author’s name? In 2009, we are creating bibliographic metadata using a code of rules developed quite separately from any computerization of data, a set of rules that was implemented at a time when the size limits of the catalog card and the searching limitations of the card catalog were still the norm. Think back to 1978, those of you who are old enough to have been in libraries then.

As the linked data grassroots folks clarified at their ALA session (see the powerpoint slides at http://wikis.ala.org/annual2009/index.php/Grassroots_Programs), we are in a different world in terms of data structures and possibilities. The change is not simply to our bibliographic records as it was throughout the 20th century but from bibliographic records to something else entirely. The Marc format, yes, was a communications format for bibliographic data but it was also a handy-dandy card set maker, the most obvious instances of that being the Festschrift byte and the second indicator of the 245. (Try explaining the utility of the Festschrift thingie without going into divided catalogs and card printing profiles!)

We have been developing our data content standards and our data structure definitions in isolation from each other. AACR2 and MARC were developed and maintained by two different groups of folks under different auspices, often at cross purposes (though not at odds with each other, just separated and sometimes unaware). Are we headed in that direction again?

Just as AACR2 wasn’t, RDA is not an open standard. It is an owned publication. As such, its maintenance and distribution will be limited by concerns related to publication ownership. Many will be priced out of having access to it while still trying to apply it in data they share with the rest of us. This is wholly antithetical to the development of data standards that are trying to “leave the silo.” It’s not that there is some evil corporate empire behind this control over our cataloging code. We did this to ourselves, starting with the first publication of ALA Rules in 1883.

The changes we must embrace are fundamental. We need to embrace openness in our standards development, joining the rest of the world in what makes the Web work. If we want to encourage use of a standard we must make it readily available to anyone creating bibliographic metadata.

We need to acknowledge that the stand-alone bibliographic record as we know it is not a valid structure in the linked data world. We need to understand that catalogs may work best if instead of policing access to the data we encourage more open wiki-style data development.

We need to figure out how we deal with the change in the economics of bibliographic metadata that must occur for libraries, vendors, system developers, and data providers. LC doesn’t want to be the world’s chief cataloger anymore. Nobody other than OPAC folks who have to will pay ALA for access to RDA when other, more open, metadata practices are available.

If we strive now to adopt RDA while shoe-horning the results into MARC-based bibliographic records and our current models of who does what, are we wasting our time on development of work arounds when we could be developing a wholly new environment? Are we going to spend hundreds and thousands of man-hours training catalogers and other library workers how to do this awkward adaptation thereby delaying our ability to pay for the eventual greater change?

We have to solve the issues of data structure and infrastructure before we go worrying about how we train catalogers to adopt RDA to Marc and the rule of three, don’t we? I think back to the hours and hours of labor spent on adopting heading changes in card catalogs for AACR2 in the early 80s, changes that really didn’t affect access and would be much easier a few years later in online catalogs.

So, what exactly does the current testing of RDA by the chosen 20 comprise?

(Unfortunately, I was not able to attend the grassroots meeting and am basing my reference to that group on their slides and other writings by the individuals in the group.)

In an October 11, 2007 AUTOCAT discussion, James Weinheimer said, “Part of fitting into the larger world of metadata will mean that we will have less control over many things, and our terminology will probably be one of the first casualties. Ultimately, I think it will turn out to be one of the easier things we will have to give up.”

Hal Cain responded, “I find it utterly perplexing that RDA is being prepared for cataloguers, its primary audience, yet with no attempt to produce any consensus about the terms in which it is to be expressed; indeed, with no reasonable attempt to explain how the new vocabulary agrees with or differs from the old. It seems to me that what we’re being led into is a different discipline from what we practice now.”

It has taken me the last year and a half to start to understand the new vocabularies and make “mental-crosswalks” to what I know about cataloging, metadata, and other aspects of information retrieval. My understanding today is far ahead of where it was last Fall when I tried to explain what I thought was happening to the joint OLAC/MOUG conference and I have the advantage of the kind of job that allows me lots of time for study and cogitation. Had I been in a library technical services operation now I would just be shaking my head and hoping someone would tell me about it later.

Why is this? Well, I must go back to the blind men and the elephant metaphor. The vocabulary-disjunct described above is like putting boxing gloves on the sightless examiners. “Here, figure this out, why don’t you — oh, and you will have to interpret it according to what you have knowledge of through this barrier of foam padding.

Sometimes the simplest ideas seem like gibberish because the terminology is so foreign. Okay, I’ll admit it, I wasn’t getting the idea of FRAD (Functional Requirements for Authority Data) because I’d come upon statements like, “Like FRBR, FRAD describes an entity-relational model, with the focus of FRAD on the entities related to ‘authority data’ rather than to the ‘bibliographic record’ itself,” and realize I couldn’t translate this into something I could understand sufficiently to explain it to students. I admitted it to Glenn Patton at the OCLC/MOUG do that I was FRAD-clueless and said told me to think of the ability to fill in blanks with data from elsewhere. Bingo, I started to get it.

Sometimes we spend too much time trying to explain things in detail, losing people along the way as their eyes glaze over. We need clear explanations that use terminology we can understand. A very clear explanation of flat files and
relational databases can be found at “What are relational databases?” 23 March 2001. HowStuffWorks.com.http://computer.howstuffworks.com/question599.htm Marc records are flat file records. Our OPACs break up the content into tables to operate. So far, so good.

MARC records, created by catalogers, using standardized content standards (e.g. AACR2, LCSH, etc.) have been the primary source of information to populate the tables of our OPACs. Information from 245 $a has been moved to the table of titles. Information from a 650 has been placed in subject tables.

The new way of looking at things says, why shouldn’t we be able to populate the tables with data from a variety of sources? If the title has already been entered in a title field in some other data labelling scheme, why could our OPACs not use that data in the absence of data from a MARC record?

Consider, for example, the representation of books on order in our OPACs. Right now, we create a MARC record to represent the ordered item — we basically do preliminary cataloging. What if our OPACs could use the ONIX data a publisher creates in lieu of a MARC record?

The cataloger in me rises up and says, “Yeah, but the publisher data might not use the same capitalization and punctuation information!” But now I have to ask, “So, what?” Does this matter? Will it affect retrieval or relevance assessment?

We are undergoing a revolution, my friends, but the revolution is not RDA or FRBR. The revolution is one that must reconsider the roles catalogs and cataloging have served. And we need to do this without fear of change but rather excitement for what this might make possible.

The article is exactly what it claims. A list of suggested research directions, wholly appropriate for the ASIS Bulletin. Don’t look for it to break new ground. Those of us who have been following the issues and discussions will appreciate the articulation of research directions but this is not really an article that some may be hoping for.

My favorite sentence, however, is in reference to the ongoing tests of the RDA draft. “These tests should generate a considerable amount of data for analysis and study. At the very least, the testing may simply reveal that the rules don’t work and thus show us how not to develop cataloging guidelines, which is always a valuable lesson.”

Paradigm shift is just as overused a phrase as deck chairs on the Titanic but what we have is a cluster of paradigmatic shifts going on. We are shifting from the concept of bibliographic and authority records to mashable metadata. RDA looked in this direction but when work was undertaken five years ago, we weren’t where we are now. We are also coming to realize that we can no longer treat our standards as the writing of a copyrighted book with its ownership in the hands of a publisher. Standards need to be open and the ownership/copyright model is antithetical to this.

I just re-read Roy Tennant’s 2004 Library Hi Tech article on metadata. It makes a great deal more sense to me now than it did five years ago. Among the statements he makes that got my attention were,

“No single organization should own the essential pieces of a new bibliographic infrastructure”

and

“We do not need a bibliographic record format. We need a bibliographic metadata infrastructure that has a number of components, each of which may have multiple variations. Our systems must be able to accommodate a great diversity of record formats to provide us with the flexibility and power that only such diversity can provide.”

First, apologies to those of you who might have been wondering whether I left the planet. Nope. Just getting ready for teaching this summer in the short terms — something I haven’t done in a few years.

Meanwhile, a great discussion has been going on at AUTOCAT and I’ve been participating over there. I will always be loyal to AUTOCAT as it’s kept me connected for so many, many years.

When our current catalogs were set in motion in 1876, there were few other bibliographic tools. Poole’s index appeared occasionally and then Wilson’s Reader’s Guide made its appearance at the beginning of the 20th century but, for most libraries, the catalog played a prime role in providing access to bibliographic information. Catalogs were published in book form and shared among libraries with larger libraries developing extensive collections of the catalogs of other libraries. The twentieth century saw the publication of many subject-specific bibliographies but it wasn’t until the middle part of the century that indexing of the periodical literature really took off.

The second half of the twentieth century saw the library catalog playing proportionally smaller role in bibliographic searching. By bibliographic searching, I do not mean the search for bibliographic records as we do in technical services operations but rather the search for citations to documents that underlies our use of catalogs and bibliographic databases. The library catalog’s chief utility was in identifying individual monographs owned by individual or groups of libraries. Each catalog did this for a limited number of libraries although the largest did it for thousands of libraries and came to be known as “WorldCat.”

Over the course of the 20th century, libraries found ways to do proportionately less and less creation of bibliographic records. By the end of the century, for most libraries, most records that enter the catalog are created by some other source, often the US Library of Congress. Because the catalog continued to limit its contents to items “held” by a library, a complex system developed for choosing and downloading individual cataloging records. Since the information explosion was under way and, as Ranganathan specified, “The library is a growing organism,” the cataloging operation continued as a major function in many libraries.

From the standpoint of the folks who do it (catalogers) and library administrators, the catalog differs greatly from other bibliographic databases. It’s chief differences are two: 1) It limits its scope by library holdings, and 2) it’s done in-house. There are advantages and disadvantages associated with these two characteristics.

As the 20th century came to an end and the 21st century began, the role of the catalog as bibliographic retrieval device continues to diminish in proportion to other retrieval devices.. A revolution approaches that requires a wholly new way of looking at bibliographic retrieval by those who are now involved with cataloging.

[A lousy place to break this off, but bear with me. It’s time to go congratulate our local graduates and welcome them to our profession!]

Christine Schwartz over at Cataloging Futures commented about the under valuing of subject analysis and classification in the new OCLC report: Online Catalogs: What Users and Librarians Want. After discussing the desire of catalog users to have a way for catalogs to retrieve more relevant results it says –on page 15 of the report (p. 23 of the .pdf)–

“Improving the relevance of search results is an interesting data quality problem whose solution goes well beyond the boundaries of the types of metadata that catalogers have been responsible for supplying, obtaining, managing or mining.”

No, NO, NO!!! What the heck are subject headings & classification but clear indications of relevance to topics specified?!? Weight the subjects in making relevance based retrieval! Ack! Instead, for myriad reasons, we waste the subjects and think we’d be just as well off without them. Grack!

Oh, I wish I had time to go into this more coherently today but it’s the end of the semester so this week is kaput. But soon, I will be back with more on this. Tonight is my last online subject analysis/classification class of the semester wherein the talk is about the future. How timely.