December 4, 2009

Understanding concept-oriented catalogs

I’ve recently been thinking a fair bit about the future of library catalogs, particularly after reading Diane Hillmann’s review of the recent Library of Congress-commissionedStudy of the North American MARC Records Marketplace. Hillmann laments that the study tries to propose economic tweaks to the current system of MARC bibliographic record distribution, but doesn’t consider the more basic questions of whether this sort of distribution is what cataloging should be focusing on in the first place, or how the nature of cataloging (broadly conceived) in the Internet age is already changing. She concludes “The change we need is not really about records, or catalogers; it’s a new way to think about information and added value.”

I agree with a lot of what she has to say in the review, but I also worry that, by itself, what she recommends might be too easily dismissed by library planners. They might imagine this sort of thinking as utopian, airy speculation with little relevance to the work they have to do now, getting books and other resources into their catalogs more efficiently. But in fact, new ways of cataloging and adding value to knowledge are already being practiced in many places online, and can be practically built on much of what libraries are already doing. So I’d like to spend a few blog posts thinking out loud about the potential I see developing for catalogs, and practical ways for realizing that potential.

What does a catalog do?

Let’s start with the basics: What are people doing when they’re using a library catalog, or engaging in other kinds of search? Typically, they have in mind some concept [1], or combination of concepts, that they want to find information about. Their concepts might not be fully formed or understood at the start of a search, and people may not be able to ideally express the concepts in ways their interface expects, but the concepts are there.

As they search, users often will refine their concepts, change the way they express them, explore related concepts, or altogether shift the concepts they’re interested in. They do this as they see what happens when they search, what resources come up, and what other relevant concepts (such as particular people, works, or focuses of study) become apparent. But the user’s ultimate goal is to obtain useful knowledge. A catalog, then, is a way of helping people get from concepts to useful knowledge resources.

There are many kinds of concepts people that searchers might be thinking of. Familiar kinds of library concepts include books, people, places, and topics of study, but there are many additional important kinds that we’ll discuss later on. Sometimes, users already have a concept of a specific knowledge resource. They might have a citation they’re trying to follow, or a particular book they want to read. Library catalogs tend to be particularly good at this kind of scenario, known in the trade as a “known item search“. [2]

This is not surprising, because today’s library catalogs tend to be strongly oriented around resources, and much less around other concepts. They primarily use MARC bibliographic records (which describe particular manifestations, to use FRBR terminology), and holdings records (which describe particular items that can be accessed or borrowed). Library science has meticulously defined and curated a number of other concepts: subjects with complex taxonomies; authors with wide-ranging authority control; uniform titles and series that link many bibliographic items together, and other bibliographic features that have carefully defined controlled vocabularies.

These concepts are represented in our MARC records, but as distinctly second-class entities. They’re typically attributes of the records that are the focus of the catalog, rather than focused records in their own right. The closest things to first-class non-resource concepts in most catalogs are authority records. But those records often contain minimal information about the concepts they describe, they’re typically invisible to most catalog users, and most online catalog interfaces don’t effectively exploit the information those records do contain. [3]

Concepts as first-class entities in catalogs

One of the long-recognized strengths of research libraries is the knowledge and organization they have of their resources, and of the concepts that these resources represent. Since a catalog maps concepts to resources, it should be possible for catalogs to represent important concepts more explicitly and expressively in the catalog, and use them as guideposts to help people find useful resources. We’re starting to see some prototype catalogs of this sort develop in the library world: consider, for instance, OCLC’s Worldcat Identities, oriented around authors; or their Fiction Finder, oriented around works with multiple manifestations; or the Online Books Page’s subject map views (such as this one on alphabets). In these catalogs, the concepts aren’t simply metadata attributes, or headings in a list of choices, but information-rich reference points for finding knowledge resources. These systems are all what I would call concept-oriented catalogs: catalogs that use various concepts (and not just concepts of the resources themselves) as first-class locuses of information to help readers find useful knowledge resources.

The promise of concept-oriented catalogs is still largely unrealized in the library world. Not only do current library catalogs often do little with conceptual knowledge, but proposals for future catalog architectures also often stick to keeping a tight focus on resources and make other concepts secondary. For instance, Coyle and Hillmann note in their 2007 review of the proposed RDA cataloging standard that “the focus of RDA is called ‘the resource’ and the resource is a FRBR manifestation/item described using the same concept of a pre-coordinated ‘record’ as we find in AACR2″ (the older cataloging standard first published in 1978). Similarly, the final project report for OLE, which aims to build a next-generation library architecture, treats resources as first-class entities, but not the descriptions of those resources. The metadata describing knowledge resources are simply attributes of the resources, and not entities that can managed and shared in their own right. [4]

Concept-oriented catalogs at Internet scale

As more and more knowledge resources become available to users, via the expansion of the Internet, the streamlining of interlibrary loan services, and the mass digitization of print library materials, well-defined, well-documented, and well-connected concepts will become increasingly important for readers that want to find what is most useful to them in a sea of information. While we will never have well-defined concepts for everything readers might be interested in, the concepts that have been defined by someone, somewhere, can serve as valuable guideposts for subsequent information seekers, if we’re smart about managing and using them.

This view might strike some longtime users of the Internet as hopelessly naive and ignorant of history. After all, the Web started out in the 1990s with various conceptually organized catalogues like the WWW Virtual Library and the Yahoo Directory. But most people soon forsook these in favor of search engines like Google that are much more comprehensive, and that work with arbitrary keywords instead of predefined concepts. Why should we think that concept-oriented catalogs will work at the Internet scale if they’ve already been tried and rejected on the Web?

In fact, though, concept-oriented cataloging is still very widely used in online information seeking today. It just isn’t the same kind of concept-oriented cataloging. The most popular concept-oriented catalog online today doesn’t force readers to go through a particular concept hierarchy before they can get at the resources they want. Instead, it typically shows up prominently whenever you do a Google search for one of the concepts it includes. Each of its millions of concepts gets reviewed for naming, redundancy, and relevance, and can be easily linked to from elsewhere on the Web. Most of its concepts have links to various external online knowledge resources, as well as to related concepts. The catalog also has ever-increasing amounts of harvestable, structured, semantic metadata about its concepts. And it’s very often characterized, even by users who acknowledge its many flaws, as a useful starting point for finding information resources online. Oh, and anyone can edit it. You’ve probably figured out by now that I’m talking about Wikipedia. While Wikipedia is not usually billed as a catalog, my description should make it clear that it does in fact serve that function, and in a concept-oriented way.

Coming attractions

I hope what I’ve written so far gives you a good idea of what concept-oriented catalogs are, and why they’re worth thinking about as we plan for the future of libraries. In posts to come, I hope to discuss various examples of concept-oriented catalogs inside and outside the library world, and talk about how they work, what concepts (and related information) they focus on, and how they are built up and maintained. And I hope to show how we construct useful, practical concept-oriented catalogs for the future, building both on the knowledge and expertise we have in libraries, and on the contributions of others.

[1] I’m using “concept” here in the broad sense of any kind of thing that might be the object of someone’s search, rather than the narrower sense used in FRBR and elsewhere of a particular kind of abstract subject. It’s surprisingly difficult to find a straightforward word for this general idea that hasn’t already been claimed for some other purpose.

[2] Though even here, with the explosion of resources available online, we’ve found it useful to augment the traditional catalog with link resolvers to help people find resources like journal articles that we don’t specifically include in our own catalogs.

[3] I’ve discussed this at length in the past with respect to LCSH subjects; see my work on subject maps for some attempts to address this problem. I’ll talk more about subject maps as one type of concept-oriented catalog in later posts.

[4] This may well change as OLE develops, and the data architecture is fleshed out. I’m not directly involved in OLE at present, but I do work for one of the lead development partners.

Share this:

Like this:

Related

5 Comments

The New Zealand Electronic Text Centre (where I used to work) has a “concept-oriented catalog” of the digitized works it offers. The IT infrastructure is a “Topic Map” (as defined by the ISO standard 13250). There are topics representing not only works and editions/manifestations, but also people (whether authors or not), places, organisations, ships, etc.

Interesting post, I see what your getting at, and look forward to your follow up posts. I do have a little trouble with the term ‘knowledge resource’ though. To me knowledge is something you have, something you acquire through research and experience. It’s something you can share with others, but what you’re sharing is information. Knowledge is internal; information external. When the information you share is internalized by someone else it becomes part of their knowledge.

Conal: Thanks for the pointer to the NZETC catalog! (Click on Conal Tuohy’s name to go there.) I particularly like what it does with authors.

Freemoth: I understand that in discussions like these, it can be hard to come up with entirely satisfactory terminology. (See my note on “concept”, and the difficulty of using that as well.) I do like the distinction you’ve made here, though for the moment at least I’m trying to focus on discussing the ideas rather than the exact terms to use for the ideas. (Though if there are already settled terms for them that I’m not using, I can consider shifting.)

I’m working on the second post in the series now, and I hope you’ll continue to read, and comment as you see fit. Thanks!

To the #2 OP, I disagree a little bit, though I see your point. Knowledge is shared, which then is information. More information makes one more knowledgeable. Depending on the way you spin, they can be one in the same IMO. Thanks for sharing.

In our library we started with the idea of “Subject portals” using the existing library resources from the catalog, databases and the web. We currently have one up and running as “Pasteur: Resources and search engine for health and medicine” under our Library site.

In short, some ideas as to how it would work:
* self-organization using profiles; we would do some hard work defining our “broad portal subject” in terms of HILCC segments (call number ranges). The portal then ties those resources together as a browsable mechanism. (nothing new)
* not just have the resources tied together (e.g. a “catalog” search would also show some related online articles and resources too) but that we could get down to writing quick subject guides we could force on top of search results (e.g. “best bets”) that could be quickly written by librarians with no tech knowledge needed.

The goal is to have 5 such portals (Health, Engineering, Business, etc.) tailored to our end audience, with the appropriate librarian subject specialists (and some technology) behind them, so that they will bring relevant information, services, and perhaps a selection of news, local events and other external content relevant to the users to have a well-rounded discovery environment.