Pages

Sunday, December 9, 2012

I have been studying, thinking and posting about web scale discovery since 2011 and my institution is currently days away from pushing it out as a default search.

In many ways, this has been one of the most technically challenging library projects I have been involved so far due to its far reaching effects, affecting everything from IT, cataloguing, e-resource management and information literacy. However, there are times when I wonder is all this time and effort spent by us on implementing web scale discovery really worth it?

Or have I spend so much time and effort on it that to avoid cognitive dissonance I am totally blind to the problems? So I am going to play devil's advocate in this post, and put up what I think are the strongest reasons for NOT implementing a web scale discovery service.

For balance, I am going to try to follow each one with a rebuttal.

1. Most discovery happens offsite, users are just doing known item searches on your library site, so you don't need a discovery service.

The often quoted OCLC report found that 0% of users started their search from the library website. Of course, many do eventually come to the library site to search but it isn't a stretch to think by then they are looking not to discover new items, but rather to figure out a way to obtain the item they already discovered offsite.

Whether be it Google, Google scholar, PubMed, PubGet, Google books, reading lists, Amazon, Mendeley the battle is already lost and our users look to these sources to find suitable, relevant items before coming back to our library sites to look for a copy.

At ILI2012, University of Illinois at Urbana-Champaign stated that users of their Ezsearch (a very impressive advanced federated search system that is for all intents and purposes on par with Summon and services in its class), did known item searches for almost half of all searches (49.4%)

I haven't managed to find any other analysis of the percentage of known item searches for discovery systems though I remember another talk I attended in ALA 2011 throwing around 30-40% (ours seem to be around 40% but we haven't fully launched).

A interesting question I didn't have time to research was also whether within systems there was a increasing trend of known item searches as discovery shifts offsite.

BTW This line of thought isn't original, in a recent presentation entitled "Thinking the Unthinkable: A Library without a Catalogue", the speaker argued that "university libraries are losing their roles in the discovery of scientific information, instead they should focus on delivery".

It's a fascinating, thought provoking talk, where the academic library at Utrecht University which has clearly abundant resources (they developed their own federated search -Omega in the 2001!) decided against implementing a new discovery tool and not only that, they planned to support discovery offsite.

I must admit, I am not quite sure if I understand correctly the plans for their webopac, whether they intend to retire it or not. I can see how one could rely on Google scholar as a discovery tool given the existence of the library links programme, but what about books? (or are they going to rely on worldcat for books?)

In any case, they seem to have recognised the harsh reality that people no longer go to libraries for discovery, but rather use library search to just to check if a item is available.

If that's the case, why do we need expensive costly web-discovery systems? She systematically run through the objections to relying on external systems like Google Scholar which we don't control.

My view is, if users are just going for known item searches, all we need are webopacs, after all if there's one thing webopacs are good at it is known item searches (or at least those with enough citation data!).

Arguably, web scale discovery systems by blending in newspaper articles, journal articles etc make it harder to find a known item. One of the things that surprised me most when I started studying and trying web scale discovery systems was when l read librarians moaning about how known item searches was surprisingly difficult for Summon and cousins.

I was surprised, wasn't this a solved problem?

Trying myself, I noticed for example, searches for database names, journal names did not always surface the record from the catalogue (if say it matched only part title) as the first result in discovery systems. Books were also problematic if users decided to "help" the system by entering both title and author (book reviews mention the author name several times hence the higher ranking? Just speculating.) and often it would surface book reviews or books reviews classed as journal articles in Summon.

So you ended up with equal or worse results for 40%+ searches.

Of course there are various ways around it, from the new database recommender in Summon, excluding book reviews by default, to bento style boxes and maybe tweaking the ranking algorithm to further weight on matching title and books, journals and databases.Though I am not expert enough to know if a system designed to support both topic searches and known item searches can match one solely or mostly designed for known item searching.

But even if this is licked you just maintained parity for 40-50% of searches that would have worked perfectly in Webopac, so what's the point?

Rebuttal 1

First off, while it's true that currently web scale discovery may have some issues with known item searches for books, databases and journals, to say that the results are equal or worse for 40-50% misses the point that of the 40-50% a large percentage are known article title searches . These searches would have never worked in webopacs if you directly entered the article title and are great time savers! Granted such article title searches are not granted to work and may fail if the article isn't indexed but they work sufficiently often (roughly at least 90% of the time for most academic library collections) to not to worry about it.

And conveniently left out was the fact that before web scale discovery, 50% of user searches, many of those would be a complicated search that found nothing in the webopac.

Many of these 50% of user searches are now infinitely better because Summon and similar systems cover journal articles and often full-text of books.

With all due respect to the speaker from Utrecht University, saying that libraries should focus on delivery only is a very defeatist attitude to take. Summon and other web scale discovery systems may not be winning back all our users, but enough of them to make it a fight.

Let's face it. Web scale discovery relevancy ranking isn't really up to the snuff at least when compared to Google.

In a fascinating study, a head-to-head test between Summon and Google Scholar was done for the first 25 results for some of the most common searches in one college's instance of Summon.

The results were stripped off of identifying data, so one couldn't tell where the results were from and then given to a panel of librarians to judge. In this independent blind test, Google scholar trounced Summon in everything from relevancy to currency AND reliability (how scholarly each result was)!

Ratings were given from 1-5 for each of the 3 factors.

You have to read the study yourself (unfortunately not free) but Summon outscored Google Scholar on relevancy for just two searches (overall mean Google scholar was higher by 0.64), but I guess they are Google, they are masters in relevancy ranking so it isn't surprising right?

A greater surprise is Summon also scored lower than Google Scholar on reliability by a mean of 0.85 points. In case you are wondering Summon had exclude newspapers option on. So much for fearing Google Scholar may not have strict standards on what counts as scholarly.

How about currency? Google Scholar wins out again by 0.52.

For what-ever reason, Web scale discovery relevancy systems as of yet are still unable to properly rank the hundreds of millions of content in the index at least not with the skill Google Scholar does it.

Librarians often moan about how web scale discovery systems tend to lack the more powerful search features found in traditional library databases.

Arguably, Google scholar can get away without precision tools to slice and dice the results due to their powerful relevancy system but library web scale discovery systems are hardly in the same league for relevancy ranking.

Compared to a more specialised database, Web scale discovery suffers from a double whammy in terms of difficulty to getting precise , controlled result sets

First off, they have to cater to the lowest common denominator so they lack the more powerful precision search features in specialised databases.

And to make matters worse, the lack of such tools hurt web discovery even more because they have to work harder to differentiate between the same term being used across disciplines. Eg Data migration could refer to either a computer science term or a social science term.

That's the reason why even many (but not all) defenders of web scale discovery will admit, web scale discovery isn't for advanced users who should search subject specific databases.

So maybe it's true, web scale discovery is for less skilled users who don't need to do through and precise searches and just need an article or two. But if that's the case, why shouldn't they just use something academic search premier or Jstor to get an article or two?

They don't need to do through comprehensive searches anyway, so why are we asking them to search through super large index of results that may confuse them?

Either way, the conclusion is web scale discovery doesn't seem to suit the needs of either unskilled or skilled searchers!

New development

A very recent study possibly gives some support to the idea that beyond a certain point, having more results doesn't help as much.

Through the magic of APIs, they pipe in the results into a webpage with 2 set of results side by side and users are asked to express a preference, or decide that they can't choose between the two.

It's a well written study that considers various factors (what is displayed seemed to be critical) but the upshot is, with the exception of Scopus which seems to be much less preferred (statistically significance reached), none of the others were preferred over the others.

While Summon did the best overall in terms of raw wins, the results was not statistically significant. A somewhat bigger surprise is that the Ebscohost set of databases or EBSCOhost 'Traditional' API, came in 2nd & even bested the newer Ebsco Discovery Service, head to head.

Of course none of these are statistically significant, but it does show that while EDS definitely has more material (in the test EDS, Primo and Summon is set to include the whole index regardless of the library's holdings), beyond a certain point they hardly make a difference. Of course 40 Ebscohost databases is still a lot of information, but one wonders if even a smaller set say 10-20 would be sufficient to get a similar result.

Also I think the argument above about not needing a huge index for beginners fails because of the following fallacy - that a unskilled user searching for a few articles can definitely find them if they use academic search premier or equalvant.

This misses the possibility that just because a user is unskilled it does not mean he won't be searching for a obscure topic where the best chance of success is to search the broadest database. Sure if he is searching for say "racial relations" pretty much every database will do, but try a more focused search like racial relations in country X, and you will quickly see the value of searching the broadest database.

In fact, an unskilled searcher who is just searching for 1 or 2 relevant articles will benefit a lot from searching a super broad web scale discovery system because he probably won't use the right terms to search, so the broadest index maximizes the chances of hitting on at least 1 or 2 results.

Similarly while it is true a researcher needs a way to do a controlled precise search to be sure he covered comprehensive results, this assumes the search he is doing has so many results he needs these tools.

But wouldn't you say that many researchers are doing very specialised searches where covering the broadest index is important as the problem they are getting is not too many results but no results?

I have seen dedicated postgraduate students who after years of digging across various sources do a quick search in Summon and are stunned to find 1 or 2 articles surfacing that are utterly relevant to what they are doing but they missed it because they just happened not to search a source that covered it.

The Johns Hopkins Libraries study is interesting but the authors of the study says it best when they muse about the finding that practically all the services they are tested are equal and point out several possibilities for this finding.

"One obvious question is whether we have a finding of no user preference, of our users, collectively, thinking all the products are about equal -- or if our findings are simply inconclusive."

They suspect larger sample sizes might not necessarily help with statistical significance but..they suspect users just didn't use it enough (or didn't use real enough examples) to really tell if there was a difference.

"If used over time in production, some products may very well satisfy users better than others -- but when asked to express a preference for a small handful of searches in the artificial context of the experiment, users may not have the capability to adequately judge which products may be more helpful in actual use. I think this is quite possible, especially if users were not using their own current real research questions to test."

More intriguing they speculate (with some evidence to existing literature)

"Some users, especially beginner/undergraduate users, may simply be unconcerned with relevance of results, being satisfied by nearly any list of results. "

Or to put it another way undergraduates satisfice , and an article that is "just kinda what they want" is considered good enough, so beyond a certain point it doesn't matter.

This is followed by a interesting musing about whether if they don't care, whether we librarians should care!

Conclusion

Unlike some I am not certain that every academic library should rush out to implement a web scale discovery system regardless of finances.

But I find it hard to think it can be a bad idea. After all library after library that has implemented Summon and other web scale discovery services and all have reported substantial increases in usage of electronic resources and that is definitely what we are trying to do, reduce friction in accessing our resources isn't it?

I have been studying, thinking and posting about web scale discovery since 2011 and my institution is currently days away from pushing it out as a default search.

In many ways, this has been one of the most technically challenging library projects I have been involved so far due to its far reaching effects, affecting everything from IT, cataloguing, e-resource management and information literacy. However, there are times when I wonder is all this time and effort spent by us on implementing web scale discovery really worth it?

Or have I spend so much time and effort on it that to avoid cognitive dissonance I am totally blind to the problems? So I am going to play devil's advocate in this post, and put up what I think are the strongest reasons for NOT implementing a web scale discovery service.

For balance, I am going to try to follow each one with a rebuttal.

1. Most discovery happens offsite, users are just doing known item searches on your library site, so you don't need a discovery service.

The often quoted OCLC report found that 0% of users started their search from the library website. Of course, many do eventually come to the library site to search but it isn't a stretch to think by then they are looking not to discover new items, but rather to figure out a way to obtain the item they already discovered offsite.

Whether be it Google, Google scholar, PubMed, PubGet, Google books, reading lists, Amazon, Mendeley the battle is already lost and our users look to these sources to find suitable, relevant items before coming back to our library sites to look for a copy.

At ILI2012, University of Illinois at Urbana-Champaign stated that users of their Ezsearch (a very impressive advanced federated search system that is for all intents and purposes on par with Summon and services in its class), did known item searches for almost half of all searches (49.4%)

I haven't managed to find any other analysis of the percentage of known item searches for discovery systems though I remember another talk I attended in ALA 2011 throwing around 30-40% (ours seem to be around 40% but we haven't fully launched).

A interesting question I didn't have time to research was also whether within systems there was a increasing trend of known item searches as discovery shifts offsite.

BTW This line of thought isn't original, in a recent presentation entitled "Thinking the Unthinkable: A Library without a Catalogue", the speaker argued that "university libraries are losing their roles in the discovery of scientific information, instead they should focus on delivery".

It's a fascinating, thought provoking talk, where the academic library at Utrecht University which has clearly abundant resources (they developed their own federated search -Omega in the 2001!) decided against implementing a new discovery tool and not only that, they planned to support discovery offsite.

I must admit, I am not quite sure if I understand correctly the plans for their webopac, whether they intend to retire it or not. I can see how one could rely on Google scholar as a discovery tool given the existence of the library links programme, but what about books? (or are they going to rely on worldcat for books?)

In any case, they seem to have recognised the harsh reality that people no longer go to libraries for discovery, but rather use library search to just to check if a item is available.

If that's the case, why do we need expensive costly web-discovery systems? She systematically run through the objections to relying on external systems like Google Scholar which we don't control.

My view is, if users are just going for known item searches, all we need are webopacs, after all if there's one thing webopacs are good at it is known item searches (or at least those with enough citation data!).

Arguably, web scale discovery systems by blending in newspaper articles, journal articles etc make it harder to find a known item. One of the things that surprised me most when I started studying and trying web scale discovery systems was when l read librarians moaning about how known item searches was surprisingly difficult for Summon and cousins.

I was surprised, wasn't this a solved problem?

Trying myself, I noticed for example, searches for database names, journal names did not always surface the record from the catalogue (if say it matched only part title) as the first result in discovery systems. Books were also problematic if users decided to "help" the system by entering both title and author (book reviews mention the author name several times hence the higher ranking? Just speculating.) and often it would surface book reviews or books reviews classed as journal articles in Summon.

So you ended up with equal or worse results for 40%+ searches.

Of course there are various ways around it, from the new database recommender in Summon, excluding book reviews by default, to bento style boxes and maybe tweaking the ranking algorithm to further weight on matching title and books, journals and databases.Though I am not expert enough to know if a system designed to support both topic searches and known item searches can match one solely or mostly designed for known item searching.

But even if this is licked you just maintained parity for 40-50% of searches that would have worked perfectly in Webopac, so what's the point?

Rebuttal 1

First off, while it's true that currently web scale discovery may have some issues with known item searches for books, databases and journals, to say that the results are equal or worse for 40-50% misses the point that of the 40-50% a large percentage are known article title searches . These searches would have never worked in webopacs if you directly entered the article title and are great time savers! Granted such article title searches are not granted to work and may fail if the article isn't indexed but they work sufficiently often (roughly at least 90% of the time for most academic library collections) to not to worry about it.

And conveniently left out was the fact that before web scale discovery, 50% of user searches, many of those would be a complicated search that found nothing in the webopac.

Many of these 50% of user searches are now infinitely better because Summon and similar systems cover journal articles and often full-text of books.

With all due respect to the speaker from Utrecht University, saying that libraries should focus on delivery only is a very defeatist attitude to take. Summon and other web scale discovery systems may not be winning back all our users, but enough of them to make it a fight.

Let's face it. Web scale discovery relevancy ranking isn't really up to the snuff at least when compared to Google.

In a fascinating study, a head-to-head test between Summon and Google Scholar was done for the first 25 results for some of the most common searches in one college's instance of Summon.

The results were stripped off of identifying data, so one couldn't tell where the results were from and then given to a panel of librarians to judge. In this independent blind test, Google scholar trounced Summon in everything from relevancy to currency AND reliability (how scholarly each result was)!

Ratings were given from 1-5 for each of the 3 factors.

You have to read the study yourself (unfortunately not free) but Summon outscored Google Scholar on relevancy for just two searches (overall mean Google scholar was higher by 0.64), but I guess they are Google, they are masters in relevancy ranking so it isn't surprising right?

A greater surprise is Summon also scored lower than Google Scholar on reliability by a mean of 0.85 points. In case you are wondering Summon had exclude newspapers option on. So much for fearing Google Scholar may not have strict standards on what counts as scholarly.

How about currency? Google Scholar wins out again by 0.52.

For what-ever reason, Web scale discovery relevancy systems as of yet are still unable to properly rank the hundreds of millions of content in the index at least not with the skill Google Scholar does it.

Librarians often moan about how web scale discovery systems tend to lack the more powerful search features found in traditional library databases.

Arguably, Google scholar can get away without precision tools to slice and dice the results due to their powerful relevancy system but library web scale discovery systems are hardly in the same league for relevancy ranking.

Compared to a more specialised database, Web scale discovery suffers from a double whammy in terms of difficulty to getting precise , controlled result sets

First off, they have to cater to the lowest common denominator so they lack the more powerful precision search features in specialised databases.

And to make matters worse, the lack of such tools hurt web discovery even more because they have to work harder to differentiate between the same term being used across disciplines. Eg Data migration could refer to either a computer science term or a social science term.

That's the reason why even many (but not all) defenders of web scale discovery will admit, web scale discovery isn't for advanced users who should search subject specific databases.

So maybe it's true, web scale discovery is for less skilled users who don't need to do through and precise searches and just need an article or two. But if that's the case, why shouldn't they just use something academic search premier or Jstor to get an article or two?

They don't need to do through comprehensive searches anyway, so why are we asking them to search through super large index of results that may confuse them?

Either way, the conclusion is web scale discovery doesn't seem to suit the needs of either unskilled or skilled searchers!

New development

A very recent study possibly gives some support to the idea that beyond a certain point, having more results doesn't help as much.

Through the magic of APIs, they pipe in the results into a webpage with 2 set of results side by side and users are asked to express a preference, or decide that they can't choose between the two.

It's a well written study that considers various factors (what is displayed seemed to be critical) but the upshot is, with the exception of Scopus which seems to be much less preferred (statistically significance reached), none of the others were preferred over the others.

While Summon did the best overall in terms of raw wins, the results was not statistically significant. A somewhat bigger surprise is that the Ebscohost set of databases or EBSCOhost 'Traditional' API, came in 2nd & even bested the newer Ebsco Discovery Service, head to head.

Of course none of these are statistically significant, but it does show that while EDS definitely has more material (in the test EDS, Primo and Summon is set to include the whole index regardless of the library's holdings), beyond a certain point they hardly make a difference. Of course 40 Ebscohost databases is still a lot of information, but one wonders if even a smaller set say 10-20 would be sufficient to get a similar result.

Also I think the argument above about not needing a huge index for beginners fails because of the following fallacy - that a unskilled user searching for a few articles can definitely find them if they use academic search premier or equalvant.

This misses the possibility that just because a user is unskilled it does not mean he won't be searching for a obscure topic where the best chance of success is to search the broadest database. Sure if he is searching for say "racial relations" pretty much every database will do, but try a more focused search like racial relations in country X, and you will quickly see the value of searching the broadest database.

In fact, an unskilled searcher who is just searching for 1 or 2 relevant articles will benefit a lot from searching a super broad web scale discovery system because he probably won't use the right terms to search, so the broadest index maximizes the chances of hitting on at least 1 or 2 results.

Similarly while it is true a researcher needs a way to do a controlled precise search to be sure he covered comprehensive results, this assumes the search he is doing has so many results he needs these tools.

But wouldn't you say that many researchers are doing very specialised searches where covering the broadest index is important as the problem they are getting is not too many results but no results?

I have seen dedicated postgraduate students who after years of digging across various sources do a quick search in Summon and are stunned to find 1 or 2 articles surfacing that are utterly relevant to what they are doing but they missed it because they just happened not to search a source that covered it.

The Johns Hopkins Libraries study is interesting but the authors of the study says it best when they muse about the finding that practically all the services they are tested are equal and point out several possibilities for this finding.

"One obvious question is whether we have a finding of no user preference, of our users, collectively, thinking all the products are about equal -- or if our findings are simply inconclusive."

They suspect larger sample sizes might not necessarily help with statistical significance but..they suspect users just didn't use it enough (or didn't use real enough examples) to really tell if there was a difference.

"If used over time in production, some products may very well satisfy users better than others -- but when asked to express a preference for a small handful of searches in the artificial context of the experiment, users may not have the capability to adequately judge which products may be more helpful in actual use. I think this is quite possible, especially if users were not using their own current real research questions to test."

More intriguing they speculate (with some evidence to existing literature)

"Some users, especially beginner/undergraduate users, may simply be unconcerned with relevance of results, being satisfied by nearly any list of results. "

Or to put it another way undergraduates satisfice , and an article that is "just kinda what they want" is considered good enough, so beyond a certain point it doesn't matter.

This is followed by a interesting musing about whether if they don't care, whether we librarians should care!

Conclusion

Unlike some I am not certain that every academic library should rush out to implement a web scale discovery system regardless of finances.

But I find it hard to think it can be a bad idea. After all library after library that has implemented Summon and other web scale discovery services and all have reported substantial increases in usage of electronic resources and that is definitely what we are trying to do, reduce friction in accessing our resources isn't it?