Sample & Hold: Rick Lugg's Blog

Thursday, May 28, 2015

As most perusers of this blog will know, our small company, Sustainable Collection Services (SCS), was acquired by OCLC on January 8, 2015. After several months of intense integration work, I can now resume writing on deselection, shared print, and long-term management of monographs. (I know, you've all been waiting with bated breath!)

Being acquired by a larger organization is a curious experience. We received many emails in January when the change was announced, most congratulatory, some concerned. One humorous subject line in particular stood out: "You have been assimilated by the Borg. Resistance is futile." We got a kick out of that one.

Nearly six months later, my colleagues Ruth, Andy, Eric, and I can report few Borg-like symptoms. Maybe it's just the assimilation talking, but it's actually been a pretty good experience to become part of OCLC. It has certainly been an adjustment to morph from a 4-person start-up to a 1,300 person pillar of the profession. Our planning cycles have lengthened. We have a few more rules to follow. We have many brilliant new colleagues.

But the most striking changes have been operational. We have in fact enjoyed some of those much-ballyhooed synergies that are claimed or implied in every press released related to an acquisition. They actually exist! And some of them have been immediate and powerful.

The Turbo Button: Since the initial SCS-OCLC partnership was formed in 2012, SCS accessed WorldCat holdings data via an API. This API was never designed to support the large batch lookups SCS was performing, but for three years we made it work, with excellent support from OCLC. But even when running at peak capacity, we could manage holdings lookups for only 200,000 titles per day. Now we have direct access to WorldCat, and it's as if we've gone from a go-cart to NASCAR. Working with colleagues in Data Services, we have created a batch process that allows us to do in hours what used to take days. This has allowed us to get data into GreenGlass more quickly, and to handle more projects. Even better, we can support larger group projects, such as the Eastern Academic Scholars' Trust (EAST), with 40 libraries and 22 million bib records.

GreenGlass for Groups (G3): SCS enjoyed our life as a bootstrap start-up, funding the

Sample G3 Screen

development of GreenGlass and other services from our own pockets and sales. As we better understood the analytics and decision-support tools that were needed, though, our ideas exceeded our ability to finance and build them. In particular, we needed a version of GreenGlass to support consortia and shared print projects. GreenGlass for Groups (try saying that three times fast and you'll see why we call it G3!) will enable participating libraries to develop,iterate, and visualize models for shared retention and stewardship. Each library can also view its collection in the context of the whole group or designated sub-groups. With OCLC's support, we will have G3 up and running this fall.

E-Book/Print Book Overlap: We've also gained access to OCLC Work ID's. Work ID's cluster related editions in a manner that enables much more sophisticated matching. It becomes possible to consider related print editions, as well as overlap between print and e-book editions owned by the library. We are supporting this matching on a custom basis for now, but plan to integrate these capabilities into GreenGlass as soon as the G3 work is completed. For now, it has significantly improved our ability to present multiple editions simultaneously for retention consideration.

Shared Print Project LifeCycle

Alignment with OCLC Shared Print Initiatives: As I've noted in other posts, retention and withdrawal decisions ultimately create additional work downstream. Materials may need to be transferred or withdrawn, and remaining collections shifted. There are records to be maintained in the local catalog, union catalog, and in WorldCat. In collaboration with our colleagues in OCLC Shared Print, we have developed a lifecycle model for print collections, and plan to develop services to support each segment of that workflow. An early example is batch registration of retention commitments. Once retention commitments are registered, it will be possible to distinguish committed holdings from ordinary holdings. With strong community participation, this will enhance the effectiveness and precision of subsequent collection analysis and shared print projects. It will also ultimately improve the efficiency of resource sharing.

The Borg, it turns out, is powerful for good reason. The scale and dispersion of WorldCat data, and the depth of talent within the organization make it fertile ground for innovation and development. Just bring the ideas.

Some of these projects are complete, some in progress, some just beginning. But we have enough experience to begin to identify patterns of success as well as some of the challenges that arise as projects increase in size. Complexity and data variability grow with the number of records (often from different systems and subject to different data management practices) under analysis. Communication and decision-making require more time and effort as the number of participants increases. These factors raise a pressing question: Are there practical limits to the size of shared print monograph projects?

In some respects, SCS is working the shared print problem from the bottom up, seeking to address immediate needs of specific institutions. SCS data sets have ranged from 2 million to 7 million records, shared across 5-34 libraries. In contrast, the OCLC Research folks are looking at the issue systemically, exploring an over-arching strategy that posits sharing on a much larger scale. A mega-regions picture of California, for instance, potentially involves 22 million records and more than 1,500 libraries--a very different order of project. There is a great deal to be learned, both practically and conceptually, from work at both ends of this spectrum.

Some key topics to be considered in future posts include:

Relationship of a mega-region to existing 'trust networks'

Discovery and delivery infrastructure: can we serve users effectively at the mega-regions level?

Challenges of scaling decision-making and communication

Shared access and/or ownership across state lines

Data work may scale up well, but complexity and errors can grow proportionately

Limits of data: is a holding actually on the shelf? Differences between local holdings data and WorldCat?

Harmonizing data from various systems, practices, date ranges

But for today, I'm interested in advancing this proposition for discussion: responsibility for archiving of print monographs is best shared at the mega-regional or national level, while responsibility servicing low-use print monographs is best shared at the micro-regional or local level. This hypothesis rests to some degree on a distinction I have previously made between "archive copies" and "service copies": Collection Security and Surplus Copies; Library Logistics: Archiving and Servicing Shared Print Monographs.

I remain convinced that we need to think about the archiving and servicing functions differently. They have very different operational requirements, and it's unclear whether a copy can serve both purposes, or whether we need to designate some archive copies and some service copies. Somehow we need both to assure the integrity of the print archival record, and simultaneously free ourselves to experiment with new approaches to access and delivery.

The management of the "service" function may change dramatically as our profession integrates the "active service vision" that Emily Stambaugh of the California Digital Library outlined so compellingly last fall, in a presentation on Curating Collective Collections: Reinventing Shared Print. Over time, the need for service copies may decline, as users become more comfortable with book-length content in electronic form, as chapter-level delivery and print-on-demand become more viable, and as expectations change. But for the forseeable future, proximity and rapid delivery remain important, even for little-used materials.

Archive copies, on the other hand, should rarely, if ever, need to be moved. The point is to have a sufficient number of print copies to back up Hathi Trust and other digital repositories, and to assure that the titles remain available for the long term. Responsibility for the archival function could be shared across much larger regions and much larger groups of institutions.

Our current shared print projects tend to emphasize access and service. In SCS shared print projects to date, nearly every group has committed to retain a minimum of two holdings of every title currently held within the group (assuming there are two to begin with, of course). This has occurred even in the smallest groups, as librarians seek to assure that a print copy, if no longer in their own stacks, resides nearby, preferably within an existing resource-sharing network with proven partners, and deliverable within 24-48 hours. And while collection managers appreciate the context provided by total US or Canadian holdings or knowledge that a title has been securely archived by Hathi Trust, so far these factors have exerted minimal influence on the group's retention thresholds. The shared print preferences we've seen to date look like this, reading from the bottom up:

But these groups are also hedging their bets on the archiving front. The preference for two holdings, even among the smallest groups, reflects the recognition that one holding may be missing or damaged--and the commitment to their users that shared print will not result in the loss of a single title from the group's existing collection. They are in effect archiving the local group's shared collection, and in many cases making explicit retention commitments for those titles. As those MARC 583 fields are updated in WorldCat, we see the genesis of the "intentionally retained" collective collection; i.e., a local service-oriented solution morphing into a contribution to a broader archival solution.

These are early days in rethinking the role of local print, and at present there are many surplus copies of many titles in the collective collection. It's possible to retain two holdings of every title within a small group of libraries and still find significant opportunity for withdrawal of unused material. It's possible to confine shared print to existing trust networks and still make progress. And this micro-regional approach may continue to make good operational sense as long as print books need to be put in the hands of readers. The two smallest circles in this diagram may remain the realm of service copies.

But as explicit retention commitments are made in these smaller groups, it's also possible to glimpse how the collective print collection might evolve. As the need to service low-use print declines in frequency (because of new ways of delivering that content), the eleven mega-regions may be exactly the construct we need to organize and distribute the archival function across the country. The emerging Hathi Trust print monographs archive could utilize this framework, and retention commitments expressed in WorldCat could help collection managers recognize where archive copies have been secured and where they are still needed. As the overall number of copies declines, stewardship would gravitate toward larger areas and groups.

For now, progress can be made safely at both ends of the spectrum without fully resolving the long-term questions. But the question of viable project size--in terms of participation, geographic reach, and data analysis-- remains interesting, and in my next post I will offer some ideas about scaling of that kind.