Visualizing Network Flows: Library Inter-lending

As part of our joint research project with the CIC Center for Library Initiatives and the OSU Library, we’re examining inter-lending flows within and outside of the 13-member CIC consortium. We are using a subset of the OCLC WorldCat Resource Sharing (WCRS) transaction data archive for this analysis. Our current data-set comprises 1.33 million request transactions, representing nearly 900,0000 individual titles loaned by CIC libraries over the past several years.

As Max Klein has noted elsewhere in this blog, the OCLC Research group is starting to experiment with new approaches to data visualization using the R statistical modeling environment and D3 JavaScript code library. (The WorldCat Live prototype is a nice example of how these are being put to use.) We are eager to integrate some of this experimental work into the ongoing CIC analysis.

Bruce Washburn, Brian Lavoie and I met recently to look at examples in Mike Bostock’s D3 gallery, which provides some great examples of how different visualization techniques can be implemented. We were looking for examples that were fit to purpose for this particular project. Since inter-lending is a library-centric approach to balancing supply and demand, Brian suggested that we focus on examples that are particularly expressive for modeling import/export flows. We settled on force-directed graphs and Sankey flow diagrams as good candidates for further exploration. And because we are especially interested in understanding the flow of library resources across geographies, we decided it was worth some additional work to enhance our WCRS transaction dataset with geo-codes, so that we can experiment with mapping flows across regions.

From his experiments with TopicWeb, Bruce has developed some facility with D3 and he is now doing some work with R. But before we run head-long into any new development work, I wanted to do some low-level experimentation to see if the data we have in hand, and the questions we are trying to answer, lend themselves to visualization in Sankey diagrams.

The three-letter symbols correspond to OCLC institution symbols for the 29 CIC collections we are examining in this project.

Another, showing the breakdown of CIC borrowing of CIC returnables:

And a third, this time with some detail for both Non-CIC and CIC borrowers — NB the number of non-CIC borrowers makes it difficult to represent them all in this format, hence the block of ‘others.’

Now, these are admittedly primitive pictures of how resources flow out of CIC libraries and into other places — but they do capture some important attributes that we are interested to explore further. For instance, it is immediately apparent that there are some major ‘sources’ and ‘sinks’ for CIC returnables. And it’s clear that while the demand generated outside the CIC is significant (greater than the demand generated within the consortium), it is extremely diffuse — spread across a population of thousands of libraries. Both of these are important for understanding for how existing library flows can be optimized. As we refine our analysis, we’ll be examining what factors are driving demand to particular libraries: proximity of lender, scarcity of alternative supply options, price incentives, efficiency of service (as measured by turn-around) , etc. And we’ll be looking at new ways to use data visualization to explore — and share — interesting and important patterns in the organization of the library system.

Constance Malpas is a Research Scientist at OCLC. Her work focuses on data-driven analysis of library collections and services, with a special emphasis on strategic planning and managing institutional change. She has a particular interest in the organization of knowledge and research practices in the sciences.

6 Comments on “Visualizing Network Flows: Library Inter-lending”

I am looking to create a data visualization similar to the one you have at the top of this page (intra CIC flows). can you help me. I am not a technical person, but I can learn these things. I have my input in excel and it has two columns and about 280 rows. I am trying to map applications to functions. any help is appreciated.

I was browsing the web looking for some way to create a visualization similar (in fact almost exactly the same kind) to the one you have (intra CIC flows) at the top of this page. I am not a technical person. Can you please help me how I can make one like that. My data is basically two columns source (functions) and target (applications). any help or advice will be greatly appreciated.

Tony,
I should have thought that the strong uptake for e.g. LibGuides would make that sort of link assessment generally interesting — I mean as a programmatic approach to integrating analytics into library support activities.
I scanned your blog (rather quickly I’ll admit) for a related post but didn’t see one. Could you point me in the right direction?
Any chance you have looked at half-life of OU course modules (if that is the term)? I have been wondering recently about what the OU has learned about re-usability of course materials based on the rate at which they have been recycled…presumably it varies a bit based on discipline (maths units will have greater longevity?). I guessed that OU would have enough historical data to take up an examination…or perhaps it was done long ago.
Constance

Constance
Some time ago I dabbled with a tool that started to explore traffic from OU course pages to library pages they linked to, my thinking being:
1) folk who wrote courses might be interested in which library links were followed
2) library folk may be interested to see what courses/link strategies sent traffic to Library pages.
No-one grokked why it might be interesting though, so I gave up on pursuing it further.

Interesting stuff – I don’t think I’ve seen the diagrams used for this before?

re: the pragmatics of generating Sankey diagrams from R, there is a new R library called rCharts (http://rcharts.io/ ) that makes it easy to use a variety of d3js powered javascript libraries to generate interactive charts from R. I’m not sure whether anyone has wrapped/demoed the creation of a Sankey diagram from R yet using this approach, but if it would be useful I could maybe have a go at working out how to do such a thing?

Tony,
Thanks for the pointer to rCharts and especially for mobilizing interested colleagues to help experiment. I’ve updated the post with a link to @timelyportfolio‘s new tutorial.
More generally, we are interested in exploring new approaches for modeling flows across the library system. The CIC project provides an opportunity to do some of that, albeit on a relatively limited scale. I know a variety of people are interested in applying network analysis to the study of library collection and space usage — if you come across any examples, I’d love to see them. If I recall correctly UIUC — for ex. — did a study of flows of users between campus (departmental) libraries and is doing some study of circ data. Would be nice to see the usage analysis rolled up to a higher level. For our purposes, ILL data is a way into understanding the larger system dynamics.
Constance

Comments are closed.

OCLC Research

Hanging Together is the blog of OCLC Research. Learn more about OCLC Research on our website.