The Library Catalogue in the New Discovery Environment: Some Thoughts

The catalogue [Note 1] has always been an important focus of library discussion; its construction and production are a central part of historical library practice and identity. In recent months, the future of the catalogue has become a major topic of debate, prompted by several new initiatives and by a growing sense that it has to evolve to meet user needs [1][2].

Much of the discussion is about improving the catalogue user's experience, not an unreasonable aspiration. However, we really need to put this in the context of a more far-reaching set of issues about discovery and about the continued evolution of library systems, including the catalogue, in a changing network environment. In this environment, users increasingly discover resources in places other than the catalogue.

This article takes a medium-term perspective and covers some issues that the further development of the catalogue, or the library discovery experience, poses. In the longer term, I think, we will see major changes in how libraries organise themselves to provide services, but coverage of that is outside my scope, and probably my competence, here. My purpose is to touch on some questions, not to provide any answers, as libraries continue to co-evolve with network behaviours and expectations.

It might be useful to begin with a somewhat schematic account of this change. The main outlines of the catalogue were formed in the pre-network world. In that world, materials were distributed to many physical locations. The closer to a user a resource was, the more likely it was to be accessed. Each of those locations developed a catalogue, which described parts of its collection. In this way, the catalogued collection (what was described in the collection, acknowledging that not everything was described) broadly corresponded to the available collection (where availability was determined by being local). In that world, information resources were relatively scarce, and consumed a considerable amount of attention: people would spend time looking in libraries, or in library catalogues, or in moving from bibliographies, abstracting and indexing services, and other finding tools back to library catalogues. This was a necessary behaviour if you were to find things.

Today, we live in a different world. Now, information resources are relatively abundant, and user attention is relatively scarce [3]. Users have many resources available to them, and may not spend a very long time on any one. Many finding tools are available side by side on the network, and large consolidated resources have appeared in the form of search engines. Even within the library, there are now several finding tools available on the network (for local repositories, A&I databases, ..). The user is crowded with opportunity. No single resource is the sole focus of a user's attention. In fact, the network is now the focus of a user's attention, and the available 'collection' is a very much larger resource than the local catalogued collection. The user wishes to 'discover' and use much more than is in the local catalogued collection. Of course, this was always the case. However, the user may be less willing to work hard to make links and connections between resources when they are on the network, and there is more incentive for the library to make the necessary linkages (to resource-sharing systems, or to search engines, for example)..

I think that this shift poses major questions for the future of the catalogue, and this shift is bound up with the difference between discovery (identifying resources of interest) and location (identifying where those resources of interest are actually available). There may be many discovery environments, which then need to locate resources in particular collections. While the catalogue may be a part of the latter process, its role in the former needs to be worked through.

The Catalogue, Discovery and the Network Environment

In this section, I will consider several general issues arising from being in a network environment, before turning in the next section to some more specific ways in which things might change.

Matching Supply and Demand: The Long Tail

One of the interesting aspects of the last couple of years is the emergence of several large consolidated information resources - Amazon, iTunes, Google, etc - which have strongly influenced behaviour and expectation. Unlike these resources, the library resource is very fragmented: it is presented as a range of databases, places, and services. In other words, libraries do not aggregate supply very well. There are at least two factors here. Firstly, there is no unified discovery experience, and, secondly, the transaction costs of using the system are often high (transaction costs refer to the cost in time or effort to perform the steps required to meet a goal). There are a range of potential transaction costs, where you have to move between systems, re-key data, or pass authentication challenges: you may have to search several resources, check for locations, make ILL requests, and so on. Compare this to popular Web resources like those mentioned a moment ago. These provide a unified discovery experience and work hard to reduce transaction costs: they aggregate supply. Think of demand. And, again, think of the large Web presences: they aggregate demand by mobilising large network audiences for resources. The fragmentation of library resources reduces the gravitational pull of any one resource on the network. Nor do these resources tend to be projected into user environments such as the course management system or RSS aggregator. There is limited aggregation of demand. Better matching supply and demand closely relates to what has become known, following Chris Anderson, as the long tail argument. (I explore this in more detail elsewhere [4]). The long tail argument is about how a wider range of resources may be found, used or bought in network environments which better match supply and demand through aggregation at the network level.

Libraries face many interesting questions as they think about how to provide services across multiple physical service points and shared network spaces. Think about some issues which arise from this discussion: I talked about unified discovery and low transaction costs as part of the aggregation of supply. So, it is likely we will see the catalogue integrated with other resources in consolidated discovery environments at various levels (metasearch, regional systems, Google, etc). It is also likely that we will see more streamlined integration of the catalogue as part of supply chains so as to reduce the transaction costs involved in discovery, location, request and delivery of materials (resolution or resource sharing, for example). On the demand side, I talked about gravitational pull and projection into user environments. We will see a variety of ways of connecting the catalogue to large-scale discovery environments. And we will see greater use of Web services, RSS and other approaches to reach out into user environments. All of these approaches are discussed in more detail below.

Discovery and Location Are Different Functions

Elsewhere, I have suggested [5] that we can think about some distinct processes - discover, locate, request, deliver - in the chain of use of library materials. (This chain does not include the various ways in which resources might be used.) Increasingly we will see these sourced as part of separate systems which may be articulated in various combinations, and across material types. A major part of the library challenge is to integrate these processes across different environments (resource sharing, metasearch, resolution, purchase options, ...), or, in the terms established above, to aggregate supply.

Now, historically the discovery and location processes were tied to each other in the catalogue. And each location required a separate inspection. Where somebody discovered something elsewhere (a citation or in a bibliography, for example) they would then inspect the catalogue. In this way the discovery process was tied to the location process, and indeed the catalogue is still closely tied to local inventory management. It is typically a part of the system which manages a part of that collection. This makes less and less sense from a 'discovery' point of view. Of course, we want to be able to find out what is in the local catalogued collection, but to what extent should that be the front door to what the library makes available? Does this give us the best available exposure for library collections? Is it tying the discovery process to a location engine?

In some ways we have end-to-end integrated library systems where the ends are in the wrong places. At one end, we have a catalogue interface which is unconnected to popular user discovery environments or workflows. It is often a somewhat flat experience with low gravitational pull in the crowded network information space. We expect people to discover the catalogue before they can discover what is in (part of) the collection. And this points to the issue at the other end: the 'fulfilment' options open out onto only a part of the universe of materials which is available to the user: that local catalogued collection.

These factors mean that the catalogue currently sits awkwardly in the array of available resources. And in fact, this appears to be realised within library vendor offerings. A couple of things are indicative here. First we see the emergence of new products, like Primo from Ex-Libris, which provide a discovery experience across a broader part of the library collection. In effect they appear to be trying to make discovery of the catalogued collection a part of a broader discovery experience encompassing those parts of the collection which are in library control: local digital collections, institutional repository, catalogue. Of course, this then needs to be articulated with the journal literature, or other resources, probably through metasearch.

And, second, in coming years we will see a new accommodation between the ILS, metasearch, resolution, electronic resource management, repositories, and other components from the ILS vendors as they try to map better this array of systems onto library requirements and user behaviours. Resolution, for example, is now used to locate instances of discovered items, usually articles. In the future, resolution seems likely to develop into more of a service router: given some metadata, what services are available to me on the resource referred to by the metadata (borrow it, buy it, send it to a colleague, ..), or which relate to the metadata itself (export in a particular citation format, for example). It is a way of connecting potentially multiple discovery experiences to multiple fulfilment (request/deliver) services, or a multiplicity of other services. So, one scenario might see the catalogued collection act as a target to a resolver, which in turn would be a resource used by various discovery services. Of course, some of these discovery environments may be outside the library altogether.

So, what we will see is multiple discovery environments. At the institutional level, we are seeing attempts to unify discovery in front of catalogue, resolver and other services, although this is not straightforward. At the same time, given the pressures discussed in the last section, there is a trend to raise catalogue discovery to the network level (regional, national, ..). Libraries Australia, Ohiolink. Worldcat.org, and Deff provide examples here. And for similar reasons, we are seeing growing consideration of exporting discovery to other environments, search engines included. In one scenario, which may become more common, discovery options may connect to materials available for purchase (either by the user, or by the library on an on-demand basis).

It is interesting to compare access to the catalogue with access to the journal literature at this stage. Historically, access to the journal literature was a two-stage process. A user looked in one set of tools - abstracting and indexing services - to discover what was potentially of interest at the article level. Then they would have journal level access to the catalogue to check whether the library held the relevant issue. To caricature Adorno, these two steps represented the torn halves of an integral whole to which they did not add up. Resolution services aim to make that integral whole, to connect the discovery and location experiences seamlessly. Of course, this is done with some expense as we construct knowledge bases to support it. The catalogue, as discussed above, allows you to locate materials in the local collection. We are now seeing scenarios emerge which make the catalogue experience similar to the historical situation with journals where we need to connect a discovery layer (which may represent much more than is in the local collection) with the ILS to locate instances of discovered items in the local collection. This again points to the likely realignment of services within the library systems environment.

The current catalogue will need to be blended in some way with the discovery apparatus for local digital collections, for materials available through resource-sharing systems, for materials available for purchase (either by the user, or by the library on an on-demand basis), for the journal literature, and so on.

The Network Is the Focus of Attention

In a pre-network world, where information resources were relatively scarce and attention relatively abundant, users built their workflow around the library. In a networked world, where information resources are relatively abundant, and attention is relatively scarce, we cannot expect this to happen. Indeed, the library needs to think about ways of building its resources around the user workflow. We cannot expect the user to come to the library any more; in fact, we cannot expect the user even to come to the library Web site any more.

A corollary of this is that there is no single destination, the world has become 'incorrigibly plural'. Search engines, RSS feeds, metasearch engines: these are all places where one might discover library materials. I have described how one might experience a catalogue at institutional, regional or international levels and be guided back to an appropriate collection. Increasingly, we need to think of the catalogue, or catalogue services and data, making connections between users and relevant resources, and think of all the places where those connections should happen.

Finally, we know that today's network users may have different expectations of such services. As well as expecting prompt delivery, they may expect to be able to rate and review, to persistently link, to receive feeds of new materials, and so on. Services need to enter the fabric of their working and learning lives through those tools they use to construct their digital workflows and identities. The emergence of social networking has also caused us to think a little differently about 'discovery'. The network conversations that are facilitated by these services, either directly where folks talk about things, or indirectly where one can trace affiliations through tagging, social bookmarking, and other approaches, have become important orientations for many people.

So, the catalogue emerged when patterns of distribution and use of resources, and corresponding behaviours, were very different than they are now. The catalogue was a response to a particular configuration of resources and circumstances. The question now is not how we improve the catalogue as such; it is how we provide effective discovery and delivery of library materials in a network environment where attention is scarce and information resources are abundant, and where discovery opportunities are being centralised into major search engines and distributed to other environments.

Multiple Discovery Experiences

As we work to aggregate supply (either through consolidation of data or of services) so we must work to place these resources where they will best meet user needs. In this process, discovery of the catalogued collection will be increasingly disembedded, or lifted out, from the ILS system, and re-embedded in a variety of other contexts -- and potentially changed in the process. And, of course, those contexts themselves are evolving in a network environment.

What are some of those other discovery contexts? I have referred to some throughout; here is a non-exhaustive list of current examples:

Local Catalogue Discovery Environments

There has been a recent emphasis on the creation of an external catalogue discovery system, which takes ILS data and makes it work harder in a richer user interface. The NCSU catalogue [6] has been much discussed and admired in this context. Ex-Libris has announced its Primo product [7] which will import data from locally managed collections and re-present it. Furthermore, we have just seen announcements about the eXtensible Catalog project [8] at the University of Rochester. One of the ironies of the current situation is that just at the moment when we begin to extract more value from the historic investment in structured data in our catalogues, and these initiatives are examples of this trend, we are also looking at blending the catalogue more with other data and environments where it may be difficult to build services on top of that structured data. Think of what happens, for example, if you combine article level data and catalogue data.

Shared Catalogue Discovery Environments

We also observe a greater trend to shared catalogues, often associated with resource-sharing arrangements. It has not been unusual to see a tiered offering, with resources at progressively broader levels (for example: local catalogue, regional/consortial, Worldcat). The level of integration between these has been small. However, in recent times we have seen growing interest in moving more strongly to the shared level. This may be to strengthen resource-sharing arrangements, the better to match supply and demand of materials (the 'long tail' discussion [4]), and to reduce costs. And once one moves in this direction, the question of scoping the collective resource in different ways emerges: moving from local to some larger grouping or back. The value of OhioLink as a state-wide catalogue is an example here. OCLC has just made Worldcat.org available, which aims to connect users to library services, brokering the many to many relations involved. A critical driver here is the benefit of consolidation, and discussion of what level of consolidation is useful. Increasingly, a library will have to consider where and how to disclose its resources.

Syndicated Catalogue Discovery Environments

Increasingly, the library wants to project a discovery experience into other contexts. I use 'syndication' to cover several ways of doing this. Typically, one might syndicate services or data. In the former case a machine interface is made available which can be consumed by other applications. We are used to this model in the context of Z39.50, but additional approaches may become more common (OpenSearch, RSS feeds, ..). How to project library resources into campus portals, or course management systems, has heightened interest here. A service might provide a search of the collection, but other services may also be interesting, providing a list of new items for example. The syndication of data is of growing interest also, as libraries discuss making catalogue data available to search engines and others, with links back to the library environment. Several libraries and library organisations are exposing data in this way. And OCLC has been very active in this area with Open WorldCat, where member data is exposed to several search engines.

The Leveraged Discovery Environment

This is a clumsy expression for a phenomenon that is increasingly important, where one leverages a discovery environment which is outside your control to bring people back into your catalogue environment. Think of Amazon or Google Scholar. Now this may be done using fragile scraping or scripting environments, as for example with library lookup or our FRBR (Functional Requirements of Bibliographic Records) bookmarklets. Here, a browser tool may, for example, recognise an ISBN in a Web page and use that to search a library resource. The work that Dave Pattern has done with the University of Huddersfield catalogue is an example here. The broader ability to deploy, capture and act on structured data may make this approach more common: the potential use of CoINS (ContextObject in Span) is a specific example here.

There Are a Lot of Questions!

We are, then, looking at complex shifts in behaviour and network systems. Many issues need to be worked through; here are some examples:

Scale, Niches and Value

Much of what I have said supports consolidation into general network level services. The unified discovery experience of the search engines, Amazon, iTunes, and so on, has been a very powerful example. And we will certainly see greater consolidation of discovery opportunities in the library space. This will be in institutional, regional/national, and vendor contexts. At the same time, we are seeing growing interest in specialisation for niche requirements. How do you scoop out the resources which are of particular relevance to a specific course, for example, or do we want to build services specialised towards those working in archaeology, biodiversity, or other disciplines? And the library will want to add more value in terms of higher-level services: what are the 'best' resources in a particular area, feeds for new materials, interaction with the developing apparatus for reading list and citation management, and so on. Being able to do some of these things effectively will require further architectural, service and organisational development. From an architectural point of view, for example, it means being much more readily able to filter, recombine and manipulate data and Web services. From an organisational point of view it means finding ways to share or outsource routine work, and focus on where the library can make a distinctive impact.

The User Experience: Ranking, Relating and Recommending

There is a general recognition that discovery environments need to do more to help the user. Developers are looking at ranking (using well-known retrieval techniques with the bibliographic data, or probably more importantly, using holdings, usage or other data which gives an indication of popularity), relating (bringing together materials which are in the same work, about the same thing, or related in other ways), and recommending (making suggestions based on various inputs - reviews or circulation data for example). Users of Amazon and other consumer sites are becoming used to a 'rich texture of suggestion', and we have data to do a better job here than we have had hitherto. This leads naturally into the mobilisation of user participation - tagging, reviews - to enhance the discovery experience. PennTags is a widely noted institutional experiment in this area. This raises interesting questions. One is the issue of critical mass, and it may be that mechanisms emerge to share this data, or to invite it at some shared level. It is appropriate to think here about the success of social networking sites, and about the attraction there is to converse and connect around shared interests: these are becoming important 'discovery' venues. LibraryThing is an intriguing example of how such interest can be mobilised to create an increasingly rich resource. This raises a second issue, about levels: are there particular local interests and contexts which would benefit being captured and how does that play with stuff on a more general level. And, third, there are architectural issues around identity and citation.

Talking to the Backend Library System

In the context of an ILS service layer [9], if the discovery environment is separated from the ILS, there needs to be a way for the two to communicate. Again, this is currently done through a variety of proprietary scripting and linking approaches. It would be useful to agree a set of appropriate functionality and some agreed ways of implementing it.

The Discovery Deficit: The Catalogued Collection Is Only a Part of the Available Collection

I am thinking of two related things here. The first - which has been discussed throughout this article - is that there will be a growing desire to hide boundaries between databases (A&I, catalogue, repositories, etc) - especially where those boundaries are seen more to reflect the historical contingencies of library organisation or the business decisions of suppliers than the actual discovery needs of users. We will see greater integration of the catalogue with these other resources, whether this happens at the applications level (where the catalogue sits behind the resolver, or is a metasearch target), or at the data level (where catalogue data, article level data, repository data, and so on, are consolidated in merged resources). We will also see greater articulation of the catalogue with external resources. This then poses a second issue, about the data itself. Our catalogues are created in a MARC/AACR world, with established practices for controlling names, subjects and so on. However, as the catalogue plays in a wider resource space, issues arise in meshing this data with data created in different regimes, and accordingly in leveraging the investment in controlled data. Think about personal names for example, where authority control practices apply only to the 'catalogued collection'. What does it mean when that data is mixed with other data? Does it become more difficult to build higher-level services which exploit the consistency of the data - faceted browse for example? Libraries have made a major historical investment in structured data. We need to find good ways of releasing the value of that investment in productive use in these new services.

Routing

As we separate functions - discovery from location and fulfilment - we need effective ways of tying them back together. Resolution was discussed as important in this context above. In the longer term, it also is an example of the broad interest converging on directories and registries. In the type of environment I have sketched here, we need registries which manage the 'intelligence' that applications need in order to tie things together. Registries of services (resolvers, deep opac links, Z39.50/SRW/U targets, ..), institutions (complex things!), and so on. One wants to be able to connect users to services they are authorised to use, or to tie institutional service points to geographic co-ordinates (so as to be able to place locations on a map), or to tie a user application to the appropriate institutional resolver (so as to be able to bring somebody from a discovered item to one that is available to them), and so on. In each case, system-wide registries will remove local development burdens.

Indexing

One of the interesting recent developments in the 'book' space has been the emergence of mass digitisation initiatives alongside existing aggregations of e-books. This opens up the prospect of access to the book literature at the full-text level, and also of building higher-level services on this new corpus of material. In effect, if they can be used appropriately, we are acquiring indexes to books scattered through many collections. We need to work through how these index resources can be leveraged to provide deeper access to local collections. For example, one can imagine a local application leveraging a 'book search engine' to find appropriate titles and then trying to locate those titles locally or against other fulfilment options.

Sourcing

This is an interesting topic which is not yet widely explored in the ILS area. The typical current model is a licensed software model where an instance of a vendor application is run locally. The examples above show some other models: local development, collaborative sourcing, and an on-demand model where the catalogue or other functionality is provided as a network service. Here, as in other areas of library systems work, we are likely to see a much more plural approach to sourcing system requirements in coming years.

Conclusion

The catalogue discussion is often presented as just that, the catalogue discussion. However, I have argued here that it belongs in a wider context. We may be lifting out the catalogue discovery experience, but we are then re-embedding it in potentially multiple discovery contexts, and those discovery contexts are being changed as we re-architect systems in the network environment. These systems include discovery systems for other collection types (the institutional repository, or digital asset repository, etc); the emergence of a general search/resolution layer within the library; external environments as different as Google and Amazon, the RSS aggregator, or the course management system. The discovery experiences will also increasingly be part of various supply chains: resource sharing or e-commerce, for example, or local resolution services.

In summary, the catalogue question is a part of the complex set of questions we will address as we re-architect the discovery-to-delivery apparatus in ways appropriate to changing network behaviours.

Notes

I have used 'catalogue' throughout this article with some unease, but alternative approaches were too clumsy. The problem I faced is that while the word 'catalogue' currently evokes a recognisable bundle of functionality, I sometimes use the word with a different bundle of functionality in mind, as I am talking about how functionality may be reconfigured across a variety of systems.

There is a growing literature on attention. The Bubble Generation blog is an interesting venue for discussion of production and use of information and other media resources in a network environment: http://www.bubblegeneration.com/