Monthly Archives: July 2011

Following internal in house testing the recommender was open to the users. In the last week of July 18 humanities postgraduates passed through the SALT testing labs, (11 PhD students, 3 taught Masters students and 4 research students). Lisa and I held three focus groups and grilled our potential users about the SALT recommender. The research methods used were designed to answer our objectives, with an informal discussion to begin with to find out how postgraduate students approach library research and to gauge the potential support for the book recommender. Following the discussion we began testing the actual recommender to answer our other research objectives which were:

Does SALT give you recommendations which are logical and useful?

Does it make you borrow more library books?

Does it suggest to you books and materials you may not have known about but are useful and interesting?

As a team we agreed to set the threshold of the SALT recommender deliberately low, with a view to increasing this and testing again if results were not good. As our hypothesis is based on discovering the hidden long tail of library research we wanted the recommender to return results that were unexpected – research gems that were treasured and worthy items but had somehow been lost and only borrowed a few times.

42 searches in total were done on the SALT recommender and of those 42, 77.5% returned at least one recommendation, (usually many more) that participants said would be useful. (As an aside, one of the focus groups participants found something so relevant she immediately went to borrow it after the group has finished!)

However the deliberately low threshold may have caused some illogical returns. The groups were asked to comment on the relevance of the first 5 recommendations, but quite often it was the books further down the list that were of more relevance and interest. One respondent referred to it as a ‘Curate’s egg’ however, assured me this was in reference to some good and some bad. His first five were of little relevance, ‘only tangentially linked’, his 6th, 7th, 8th, 9th, 11th and even 17th recommendations were all ‘very relevant’. Unfortunately this gave disappointing results when the first 5 suggested texts were rated for relevance, as demonstrated in the pie chart below.

However the likelihood of borrowing these items gave slightly more encouraging results;

Clearly we’ve been keen on the threshold count. Lessons need to be learnt about the threshold number and this perhaps is a reflection of our initial hypothesis. We think that there would be much merit in increasing the threshold number and retesting.

On a positive note, initial discussions with the researchers (and just a reminder these are seasoned researchers, experts in their chosen fields familiar and long term users of the John Ryland’s University Research Library) told us that the recommender would be a welcome addition to Copac and the library catalogue. 99% of the researchers in the groups had used and were familiar with Amazons recommender function and 100% would welcome a similar function on the catalogues based on circulation records.

Another very pertinent point, and I cannot stress this strongly enough, was the reactions expressed in regards to privacy and collection and subsequent use of this data. The groups were slightly bemused by questions regarding privacy. No one expressed any concern about the collection of activity data and its use in the recommender. In fact most assumed this data was collected anyway and encouraged us to use it in this way, as ultimately it is being used to develop a tool which helps them to research more effectively and efficiently.

Overwhelmingly, the groups found the recommender useful. They were keen that their comments be fed back to developers and that work should continue on the recommender to get the results right as they were keen to use it and hoped it would be available soon.

While early tests with a sample set of data from JRUL were encouraging, see See SALT – a demo, an overhaul of the methodology behind the recommender API was required once the full set of loan transactions was obtained.

It was feared that processing the data into the nborrowers table – containing, for each combination of two items, a count of the unique number of library users to have borrowed both items – might become too onerous with the anticipated 3 million records. That fear turned to blind panic when 8 million loan records actually arrived!

The approach for processing the data for the API was thus re-jigged. As before the data was loaded into two MySQL tables, items and loans, and then some simple processing pushed the total number of loans for each item into a further, nloans, table. The remainder of the logic for the recommender was moved to run, on demand, in the API.

Given the ISBN of a certain item, let’s say ITEM A, and a threshold value, the PHP script for the API was coded to do the following:

Find the list of all users in the loans table who have borrowed ITEM A

For each user found in 1. find the list of all items in the loans table that have been borrowed by that user

Sum across the lists of items found in 2. to compile a single list of all possible suggested items which includes, for each of these items, the number of unique users to have borrowed both that item and ITEM A

From the list in 3. remove ITEM A and any items for which the number of unique users falls below the given threshold

For each item in the list derived in 4. divide the number of unique users of that item by the total number of times that item has been borrowed, from the nloans table

Rank the items in the list in 5. by the ratio of unique users to total loans

Find the details of each item in the list in 6. from the items table and return the list of suggestions

Testing showed that certain queries of the MySQL database involved in the above process were time consuming and affected the responsiveness of the API. The following extra pre-processing was thus performed:

The items table was split into 10 smaller tables

The loans table was split into 5 smaller tables

With queries rewritten so that searches access each of these smaller tables in turn rather than just looking at the original, large tables there was a significant boost in API performance. The number of divisions for the above splits was somewhat arbitrary but was sufficient to render the API usable for testing.

Further analysis would more than likely bring additional performance benefits, especially relevant as the amount of data is only going to grow (*). Also on the to-do list is expanding the range of output formats for the API; at present only xml and json are offered though both of the developers implementing the API in Copac and in JRUL respectively suggested that jsonp would be easier to work with.

(*) For reference, just over 8 million loan transactions are used for the current SALT recommender covering all available records up to July 2011, and these loans feature around 628,000 individual library items.

We’re gearing up to meet with some of our collaborators tomorrow from the M25 consortium and Cambridge University Library; they’re helping us explore whether the model we’re developing for SALT (i.e. a centralised aggregation service & a shared API) is something we should pursue further). In preparation for that I am working furiously to gather all the learning that has happened (and is happening) so far. As I write, Lisa and Janine are running user testing and focus groups with postgraduate humanities student from the University of Manchester using the live prototype we have up from Copac. We’ve asked this group to run their own searches and assess the results. Below you can see a sampling of what they’ve been looking for, and what recommendations they’re getting.

The overall reaction is very positive, although in general people find the recommendations further down the rankings to be more relevant. Up top, some of the recommendations are at times off-base. My own favourite book to search, Judith Butler’s Gender Trouble, is no exception — some apparently random items at the top of the list when you look at that one. (I do wonder if this represents a particular reading list for a theatre studies course at JRUL, and this is why it’s happening. Worth exploring). Overall, though, we think this is because we’ve set the threshold (only 3 borrowers in common) deliberately low to test out our long tail hypothesis (Dave Pattern kindly explains here how the algorithm works). What we’re also hearing is that students are finding items they’ve not discovered before now which they deem relevant and likely to borrow. A few have commented that the recommendations get them thinking a bit more laterally — that the concepts they are exploring are picked up in different disciplinary contexts. I find this part particularly interesting when considering in light of the search and research behaviour of humanities researchers. A fuller report on the user testing will be published shortly.

It will be interesting to see if some of the lower ranked recommendations might still be considered ‘long tail’ (and we need to consider how we’re defining ‘long tail’ in this context, of course). This will be an interesting topic for discussion for tomorrow, and we’d definitely welcome the views of any readers on this score (as well as the question of relevancy).

Another interesting note: I ran a search for items in the University of Huddersfield OPAC to compare the recommendations. Interestingly, none of the items (admittedly miniscule sample here!) are in the Huddersfield OPAC. Obviously we’re dealing with two very different libraries and use cases here, but it drives home to me how differentiated the benefits will be at the local level, and raises questions over the utility of sharing activity data for use at the local level. That said, I think what we have with the JRUL data (10 years worth) is something that could likely be of real value to a lot of libraries. How much of a ‘critical mass’ do we need for this to be critical? Do libraries need to have a similar core user base and mission to JRUL to derive benefit from this set, I wonder?

Anyway, enough musing. Here are the recommendations. (btw: For now the below links go to the Copac protoytpe, but this is only going to be public for a while –we’re planning to launch formally in the autumn, but we’ve got work to do yet!)

“In Investment, Profit, and Tenancy: The Jurists and the Roman Agrarian Economy Dennis P. Kehoe defines the economic mentality of upper-class Romans by analyzing the assumptions that Roman jurists in the Digest of Justinian made about investment and profit in agriculture as they addressed legal issues involving private property. In particular the author analyzes the duties of guardians in managing the property of their wards and the bequeathing of agricultural property. He bases his analysis on Roman legal sources, which offer a comprehensive picture of the economic interests of upper-class Romans. Farm tenancy was crucial to these interests and Kehoe carefully examines how Roman landowners contended with the legal, social, and economic institutions surrounding farm tenancy as they pursued security from their agricultural investments.” “Investment, Profit, and Tenancy will be of interest to students of Roman history, particularly the legal, social, and economic history of the Roman empire.”–BOOK JACKET.

Title from e-book title screen (viewed June 17, 2008).
Includes bibliographical references (p. 332-356) and index.
Also available online.
Electronic reproduction. UK : MyiLibrary, 2008 Available via World Wide Web. Access may be limited to MIL affiliated libraries.
Dates of available copies: 2005, 2007.

Contents

Part 1: Luxury, Quality, and Delight — 1. The Delights of Luxury — 2. Goods from the East — 3. Invention, Imitation, and Design — Part 2: How it was Made — 4. Glass and Chinaware: The Grammar of the Polite Table — 5. Metal Things: Useful Devices and Agreeable Trinkets — Part 3: A Nation of Shoppers — 6. The Middling Classes: Acquisitiveness and Self-Respect — 7. ‘Shopping is a Place to Go': Fashion, Shopping, and Advertising — 8. Mercantile Theatres: British Commodities and American Consumers.

Summary

Luxury and Pleasure in Eighteenth-Century Britain explores the invention, making, and buying of new, semi-luxury, and fashionable consumer goods during the eighteenth century. It follows these goods, from china tea ware to all sorts of metal ornaments such as candlesticks, cutlery, buckles, and buttons, as they were made and shopped for, then displayed in the private domestic settings of Britain’s urban middling classes. It tells the stories and analyses the developments that led from a global trade in Eastern luxuries beginning in the sixteenth century to the new global trade in British-made consumer goods by the end of the eighteenth century. These new products, regarded as luxuries by the rapidly growing urban and middling-class people of the eighteenth century, played an important part in helping to proclaim personal identities,and guide social interaction. Customers enjoyed shopping for them; they took pleasure in their beauty, ingenuity or convenience. All manner of new products appeared in shop windows; sophisticated mixed-media advertising seduced customers and created new wants.This unparalleled ‘product revolution’ provoked philosophers and pundits to proclaim a ‘new luxury’, one that reached out to the middling and trading classes, unlike the elite and corrupt luxury of old. Luxury and Pleasure in Eighteenth-Century Britain is cultural history at its best, built on a fresh empirical base drawn directly from customs accounts, advertising material, company papers, and contemporary correspondence. Maxine Berg traces how this new consumer society of the eighteenth century and the products first traded, then invented to satisfy it, stimulated industrialization itself. Global markets for the consumer goods of private and domestic life inspired the industrial revolution and British products ‘won the world’.

Review

…deserves to be the final word on the luxury debate in Britian Martyn Powell, Annual Bulletin of Historical Literature Luxury and Pleasure is an interesting, accessible and well-illustrated synthesis of new research and recent writing, and helpfully concludes by pointing to further areas of research Hannah Smith, History Journal Readers will find this book valuable Joyce Burnette, English Historical Review

Published in association with the York Archaeological Trust.
Includes bibliographical references (p. 1863-1867) and index.
Published in association with the York Archaeological Trust.
In English.

Summary

A great deal of material was recovered from Coppergate during archaeological excavations. Of 1147 artefa cts found there, 1006 are from the 9th to 13th centuries and nearly 700 are from the Anglo-Scandinavian period. Many rel ate to the textile industry ‘

The John Rylands University Library New Directions Library strategy includes a commitment to ” investigate innovative ways to extract, reuse and expose data across our systems to enhance the searching and usage of our resources.” The release of JRUL loan data (details of about 8 million transactions going back 10 years) is viewed by the JRUL as part of this commitment.

The main issue to address in releasing this data is how you do this in way that protects personal information whilst ensuring that the data can be used in a meaningful way. Within JRUL the following approach was agreed.

The first step was to anonymise the data. This was partly done by removing all details about an individual for each loan transaction apart from a single user ID field which provides a system generated ID unique to that system. Following discussion with colleagues on the project it was then agreed that student course details would also be removed to eliminate the small risk that an individual could be identified in this way.

In joining the SALT project, the JRUL agreed to make its loan data available for use by the SALT recommender service and to make the data available to others. The second step was to agree the terms on which this data would be released. JISC and the Resource Discovery Taskforce argue, in the context of bibliographic data, that the most open licence possible should be used and that restrictions, such as restricting use to non-commercial activities, should only be applied if the impact of this is fully understood. They also strongly recommend that institutions use the standard licences now widely available rather than developing their own http://obd.jisc.ac.uk/rights-and-licensing. Whilst there are common principles between the sharing of activity data and bibliographic data there are also some differences. In particular, activity data is unique to that particular institution and is generated from the behaviour of individuals within the institution. Rather than waiving all rights, therefore, a recommendation was made to the University Librarian that JRUL activity data be licensed for use outside of the University and that this be done using the most open licence available.

The University Librarian has now agreed that JRUL anonymised loan data will be made available under a Creative Commons attribution only licence (CC BY).

A working prototype based on just 5 weeks of data from John Rylands UL

July is proving to be a fast and furious month in terms of presenting on SALT, crunching the remainder of the gargantuan amount of data we’re receiving from our partners at JRUL, finessing the API, implementing the developments in the Copac prototype (and showing it off to people at the CILIP Umbrella conference), and running various workshops and user testing sessions to test our hypothesis (as much as we can in this time frame) and see whether the ‘shared service’ side of all this might actually scale.

We’re working intensely, but reflecting too. The workshop hosted by the RISE project earlier this month was an excellent opportunity to step back and see the bigger picture, and reflect on the benefits we’re aiming to realise for libraries and their users. A reminder of what we’re attempting to find out:

Hypothesis…

Library circulation activity data can be used to support humanities research by surfacing underused ‘long tail’ library materials through search

Also… how sustainable would an API-based national shared service be?

And can such a service support users and also library workflows such as collections management?

What we already know
We know that arts and humanities students and academics borrow books.

Research conducted in-house by Mimas, and also by others (for example Carole Palmer ) also highlights the differences in search methodologies between this demographic and their STEM counterparts. In short, humanities researchers tend to search centrifugally, ‘berry picking’ from various trails. Mimas’ recent research with Mindset and Curtis and Cartwright indicates that newer postgraduates tend to work in quite an isolated way – asking few if any for advice on where to search (supervisors feature heavily in this regard, whereas subject librarians do not at all) and sticking with a few ‘known’ resources). While these users are typically suspicious of the idea that allowing other users to annotate, tag, or rate items would be of benefit to them, there is generally a positive response when asked about the usefulness of a recommender function; in fact Amazon is used significantly in this regard to help users find related materials that are not surfacing through a ‘traditional’ library search.

The benefits
So what additional benefits might we realise through this work, especially if we move on t

o aggregating data from additional libraries?

Key to our hypothesis is the belief that such systems can help surface and hopefully increase the usage of hidden collections. Obviously circulation data is only going to offer a partial solution to this problem of discoverability (i.e. many ‘hidden gems’ are of course non-circulating) but nonetheless, we believethat the long tail argument borne out by Chris Anderson can also hold true for libraries – that the collective ‘share’ or recommendation of items can turn the Pareto Principle on its head. For libraries this means being able to demonstrate the value of the collections by pointing to increased usage. It might also give libraries a better sense of what is of value to users, and what perhaps is not.

For users, particularly those in the humanities, a recommender function can help providing new routes to discovery based on use and disciplinary contexts(not traditional classification).In other words, what you areviewing through ‘recommenders’ are patterns of real usage, how other users with similar academic interests are aggregating texts. This is particularly useful for finding conceptually related groupings of texts that cut across differentdisciplines, and which will not ordinarily sit together in a standard results set.

It also means we can support humanities users in their preferred mode of discovery, powering ‘centrifugal searching’ and discovery through serendipity. The downstream benefits of this concern the emergence of new, original research, new knowledge and ideas.

Last week we sat down with a group of collections managers from JRUL as well as Leeds University and talked with them about other possible benefits related to library workflows which we weren’t yet seeing. Here are the potential benefits we came up with:

Aggregated activity data could support activities such as stock weeding by revealing collection strengths and allowing librarians to cross check against other collections.

By combining aggregated collection data and aggregated activity data, librarians will see a fuller picture. This means they can identify collection strengths and recognise items that should be retained because of association with valued collections. We thought about this as a form of “stock management by association.” Librarians might treat some long-tail items (e.g. items with limited borrowing) with caution if they were aware of links/associations to other collections (although there is also the caveat that this wouldn’t be possible with local activity data reports in isolation)

Aggregated activity data could have benefits for collection development. Seeing the national picture might allow librarians to identify related items – “if your collection has this, it should also have…”

This could also inform the number of copies a library should buy, and which books from reading lists are required in multiple copies.

Thinking more outside the box, we thought it might also inform digitization decision-making – i.e. if you digitized this, you might also consider digitizing…

This could also have benefits when discussing reading lists and stock purchases with academic staff, and thus enhance academic engagement

Over the next couple of weeks we’ll have quite a bit more to report as we analyse sustainability issues with other libraries, and perhaps most importantly, put the recommender itself in front of postgraduate humanities researchers to see if our hypothesis is likely to be proven true — at least based on what they tell us (the true test will happen over the next year as we monitor impact via JRUL and Copac).