Pages

Saturday, June 15, 2013

Web scale discovery systems as a class of product has existed for over 4 years, and there has been rapid adoption by academic libraries around the world. We are currently way past earlier adopter phase, and probably deep into if not past early majority phase.

Some of the early leaders in this space like Summon are even announcing a "2.0" version, which may or may not be marketing hype but is symbolic I guess in signalling that products in this class have reached a certain amount of maturity.

Today in 2013, Summon alone has over 500 libraries using it, and many more are using Worldcat local, Primo Central, Ebsco Discovery Service etc. As usual, this has led to the rise of professional literature written on the topic (see list curated by me here as well as Flipboard custom magazine), covering a host of areas including

With all this literature out there, what do we really know about web scale discovery services in 2013 that we didn't know in 2009 and what are some issues where the jury is still out?

Some qualifiers.

First, I don't profess to know all the answers or have read or even remembered every study done on discovery services, nor am I an "expert", though I have kept my eye on this interesting area.

Second, I have far greater familiarity towards Summon (which we test and implemented in our institution in 2011-2012) and to some extent EDS so what I write might apply only for Summon. (e.g I wonder if EDS interface with more advanced features at the cost of a crowded user-interface would mean advanced users would be more satisfied) . Still I suspect on the general high level view, the web scale discovery services on the market are similar enough that most statements appear for all of them.

Third, I am going to speculate based not just literature but also my own knowledge and feel of what the general consensus is (which might be wrong).

I hope this post can lead to some fruitful discussion, even if you disagree with what I have written.

This seems to be the result that is most robust and uncontroversial. Every library that has implemented discovery services has reported on the whole usage of eresources has gone up.

Distractors might say, users might be downloading more, but do they actually find what they need? Or even if they found something that is just good enough but is it the best? That's a (possibly) fair but different point.

2. Undergraduates generally love discovery services

Again another point that is I believe mostly accepted. Survey after survey has shown undergraduates are generally happy with discovery services because it mostly fits their mental models by functioning somewhat like Google. Are they perfect and do all undergraduates like them? Of course not, but on a whole, libraries that have surveyed users have mostly obtained positive feedback compared to existing catalogue or search tools, this is of course unlike results for federated search in the past.

3. Librarians reactions towards discovery services are mixed at best.

The earliest study I am aware of that surveyed librarians reactions to Summon reported "culture shock". This seems to be the default reaction of librarians who encounter discovery tools for the first time. Of course, this was by one earlier adopter library in Australia, back when the concept of discovery tools was still novel to the profession. and the study itself suggests based on a followup survey 6 months later that as librarians get used to the concept, they become more positive towards the tool.

However, more recent studies on librarians attitudes towards information literacy such as this one and this one suggest that librarians attitudes towards discovery tools are still polarized or ambivalent whether it be when using it and recommending it to users at the reference desk or teaching in classes. Attitudes range from enthusiastic support (see the series of free recorded webinars on Summon and information literacy adopted by librarians) to acceptance (sometimes grudgingly) to extreme opposition for instance claiming that teaching Summon is "a dereliction of duty reference librarians have towards their users" - one of the more extreme statements found in the literature.

Based on discussions with librarians both within and outside my institution, I can verify as well that there are many highly qualified reference librarians who dislike discovery services intensely and not out of mere ignorance or resistance to change.

There have been a couple of blog posts and papers trying to explain this resistance. Here's my take combining reasons I have seen given in various papers with my own thoughts.

Relevancy ranking results can be inconsistent if not awful (opinions vary on how bad this issue is, possibly depending on expectations, implementation and discipline).

Lack of advanced search features

Worry that some important material is missed out in the index or in some disciplines totally inadequate. Related is the view that a subject specific database is almost always better eg PubMed.

Worry that users are unaware that they are missing out material not found in the index, and they may settle for good enough instead of the best available

Worry that discovery services are damaging information literacy skills by misleading users into thinking research is easy

Technical issues relating to instability of linking to full-text, clarity of labels in the interface etc

Uncertainty on how to position discovery systems next to databases and how to teach

Worry that libraries are handing over too much power to discovery services due to lockin by discovery service providers who are simultaneously content providers (example of recent dispute).

Each point can be of course expanded further, for instance relevancy itself can be a big area, with some librarians unhappy about the weighting of content types (newspaper articles appearing too often instead of books) while others are unhappy with the overall relevancy ranking for known item searches.

4. Advanced searchers generally mirror the attitudes of librarians and are not as satisfied

As expected, experienced researchers and faculty staff generally mirror the opinions of librarians and they are a lot less enthusiastic than undergraduates in general because they are familiar with what databases offer and are more demanding on what they should get.

But I doubt most librarians will say Summon or any other discovery service is as good as it can be and would yearn for better relevancy.

I am personally more sympathetic towards discovery systems in this area, though having spent countless hours studying and duplicating thousands of user searches since June 2012, I am well aware of how poor the relevancy ranking of Summon can be on some searches (I have also done limited testing on other systems).

Lest I be accused of not giving examples here's one Singapore "national service" , where currently the first 9 results are totally irrelevant. Though one example hardly proves a pattern, I am sure any librarian familiar with discovery services can give dozens of examples similar to this one. But of course, relevancy isn't an easy problem to solve and to be fair in this case, doing the same search without quotes actually gives you better results but still poor results.

Also as mentioned before there was in the early days doubts on how good such systems are for known item searching particularly for catalogue items and this continues to this day despite improvements.

6. Adding Federated search does not add much to web scale discovery (currently)

This is somewhat more controversial. But I believe the current consensus is moving towards the idea that tagging on federated search to web scale discovery is not that useful, at least with current implementations of this. An early debate in 2009 was sparked on the Federated Search Blog with the post Beyond Federated Search and followups, that critiqued Summon for lacking federated search, claiming that a hybrid solution of indexing what you can, and doing a broadcast search (federated search) over what you can't should be the way to go.

I could be wrong, but my impression is that many libraries that implemented Ebsco Discovery Service which does have federated search, have chosen to turn off the federated search portion, basically because it wasn't used and/or was counterproductive.

Federated Search is Dead -- and Good Riddance! , a piece explaining why James Madison University (JMU) turned off the EBSCO Integrated Search federated search add on included in EBSCO Discovery Service is perhaps a typical reaction.

Essentially the sheer size of the index of discovery services like Ebsco Discovery service or other services, means that students have no incentive to wait 30 seconds for more results, the problem they face typically is too many results, not insufficient results. Scholars will already be using traditional databases anyway as primary search tool (e.g Scopus) and may just use Web Scale discovery tools as a final round-up of what they have missed so they don't really have a dying need to see results from such traditional databases anyway.

I would say even Ebsco is downplaying the significance of the option of federated search in their EDS service, as a look at their pages on EDS does not mention federated search at all (though to be fair it's a seperate product EHIS), and there is even a page on platform blending (which I frankly don't quite understand what is going on here despite a vendor explaining it to me) where they go out of their way to state it is "not federation"

Of course, an argument could be made (correctly I think) that the idea of a hybrid system is sound but the implementation needs a lot of work to make it worth it, but currently it seems of the 4 major players in the market none seem to have cracked this issue yet and may not do so in the foreseeable future as it is perhaps not a priority.

7. Content providers are generally eager to cooperate with discovery vendors to have their content indexed.

One of the reasons why the need for federated search seems to have diminished is because more and more content is getting indexed. In 2009, there was still uncertainty on how content providers would react , would they want to be included? and discovery vendors had to work hard to get content included. If most did, then federated search would be of limited value except for reasons related to currency of results. If most couldn't be indexed, then federation would be crucially important to get at those resources.

As of 2013, the situation has clarified, over the years as more libraries started to release data showing that usage tends to fall for anything not in discovery services and or conversely anything indexed in them will lead to increased in usage, content providers have become more and more eager to be indexed or risk being cut out of the game.

The earlier mentioned James Madison University paper is perhaps instructive. Back in 2010 where he was describing the situation, of the sources, he mentioned that was accessible via federated search, by now many like JSTOR, Sage, Sciencedirect etc all are now indexed in Summon and probably other discovery services.

More interesting even A&I services like Scopus, Web of Science, MLA, ERIC are often included in many discovery services now though with appropriate safeguards to ensure their records are shown only for authenticated users.

That said, there are still hold-outs, the well known Psycinfo, EconLit etc and other A&I databases that work with Ebsco Discovery Service only is perhaps the most gaping hole currently existing.

And of course the above refers only to publishers but in general aggregator databases have been less willing (Gale seems to be a an exception here being included in Summon since 2009 and recently added to ebscohost discovery service as well as others) particularly those owned by Proquest and ebscohost are typically out of bounds to discovery services of competing services barring some special agreement.

8. Problems of broken links are still an issue though the problem is less serious and likely to be so in future
One of the greatest issues with discovery services is that they typically rely heavily on openurl to get to the full text. As is well known openurl linking is not 100% reliable, so discovery services have put in place alternate routes to full text.

For example Summon implemented "Index-Enhanced Direct Linking" and EDS has their smart links (if content is in the ebscohost databases) or custom links (I believe equivalent to Summon's index-enhanced direct linking in most cases)

That said, linking to newspaper articles, non-journal items and free content can still be iffy.

I confess it took quite a bit of effort and courage to get this piece written and posted. Sometimes I wondered if I was getting the general consensus totally wrong, and yet other times I thought what I wrote is totally trite and obvious that people knew even right at the start of 2009.

I suspect the later is more likely to be correct, because I decided to err on side of caution and list the statements I thought were definitely agreed upon and bump the ones I was unsure to a follow- up blog post "X things we still are unsure about web scale discovery systems in 2013".

But what do you think? What else is it we know about discovery services that were in doubt in 2009?

Web scale discovery systems as a class of product has existed for over 4 years, and there has been rapid adoption by academic libraries around the world. We are currently way past earlier adopter phase, and probably deep into if not past early majority phase.

Some of the early leaders in this space like Summon are even announcing a "2.0" version, which may or may not be marketing hype but is symbolic I guess in signalling that products in this class have reached a certain amount of maturity.

Today in 2013, Summon alone has over 500 libraries using it, and many more are using Worldcat local, Primo Central, Ebsco Discovery Service etc. As usual, this has led to the rise of professional literature written on the topic (see list curated by me here as well as Flipboard custom magazine), covering a host of areas including

With all this literature out there, what do we really know about web scale discovery services in 2013 that we didn't know in 2009 and what are some issues where the jury is still out?

Some qualifiers.

First, I don't profess to know all the answers or have read or even remembered every study done on discovery services, nor am I an "expert", though I have kept my eye on this interesting area.

Second, I have far greater familiarity towards Summon (which we test and implemented in our institution in 2011-2012) and to some extent EDS so what I write might apply only for Summon. (e.g I wonder if EDS interface with more advanced features at the cost of a crowded user-interface would mean advanced users would be more satisfied) . Still I suspect on the general high level view, the web scale discovery services on the market are similar enough that most statements appear for all of them.

Third, I am going to speculate based not just literature but also my own knowledge and feel of what the general consensus is (which might be wrong).

I hope this post can lead to some fruitful discussion, even if you disagree with what I have written.

This seems to be the result that is most robust and uncontroversial. Every library that has implemented discovery services has reported on the whole usage of eresources has gone up.

Distractors might say, users might be downloading more, but do they actually find what they need? Or even if they found something that is just good enough but is it the best? That's a (possibly) fair but different point.

2. Undergraduates generally love discovery services

Again another point that is I believe mostly accepted. Survey after survey has shown undergraduates are generally happy with discovery services because it mostly fits their mental models by functioning somewhat like Google. Are they perfect and do all undergraduates like them? Of course not, but on a whole, libraries that have surveyed users have mostly obtained positive feedback compared to existing catalogue or search tools, this is of course unlike results for federated search in the past.

3. Librarians reactions towards discovery services are mixed at best.

The earliest study I am aware of that surveyed librarians reactions to Summon reported "culture shock". This seems to be the default reaction of librarians who encounter discovery tools for the first time. Of course, this was by one earlier adopter library in Australia, back when the concept of discovery tools was still novel to the profession. and the study itself suggests based on a followup survey 6 months later that as librarians get used to the concept, they become more positive towards the tool.

However, more recent studies on librarians attitudes towards information literacy such as this one and this one suggest that librarians attitudes towards discovery tools are still polarized or ambivalent whether it be when using it and recommending it to users at the reference desk or teaching in classes. Attitudes range from enthusiastic support (see the series of free recorded webinars on Summon and information literacy adopted by librarians) to acceptance (sometimes grudgingly) to extreme opposition for instance claiming that teaching Summon is "a dereliction of duty reference librarians have towards their users" - one of the more extreme statements found in the literature.

Based on discussions with librarians both within and outside my institution, I can verify as well that there are many highly qualified reference librarians who dislike discovery services intensely and not out of mere ignorance or resistance to change.

There have been a couple of blog posts and papers trying to explain this resistance. Here's my take combining reasons I have seen given in various papers with my own thoughts.

Relevancy ranking results can be inconsistent if not awful (opinions vary on how bad this issue is, possibly depending on expectations, implementation and discipline).

Lack of advanced search features

Worry that some important material is missed out in the index or in some disciplines totally inadequate. Related is the view that a subject specific database is almost always better eg PubMed.

Worry that users are unaware that they are missing out material not found in the index, and they may settle for good enough instead of the best available

Worry that discovery services are damaging information literacy skills by misleading users into thinking research is easy

Technical issues relating to instability of linking to full-text, clarity of labels in the interface etc

Uncertainty on how to position discovery systems next to databases and how to teach

Worry that libraries are handing over too much power to discovery services due to lockin by discovery service providers who are simultaneously content providers (example of recent dispute).

Each point can be of course expanded further, for instance relevancy itself can be a big area, with some librarians unhappy about the weighting of content types (newspaper articles appearing too often instead of books) while others are unhappy with the overall relevancy ranking for known item searches.

4. Advanced searchers generally mirror the attitudes of librarians and are not as satisfied

As expected, experienced researchers and faculty staff generally mirror the opinions of librarians and they are a lot less enthusiastic than undergraduates in general because they are familiar with what databases offer and are more demanding on what they should get.

But I doubt most librarians will say Summon or any other discovery service is as good as it can be and would yearn for better relevancy.

I am personally more sympathetic towards discovery systems in this area, though having spent countless hours studying and duplicating thousands of user searches since June 2012, I am well aware of how poor the relevancy ranking of Summon can be on some searches (I have also done limited testing on other systems).

Lest I be accused of not giving examples here's one Singapore "national service" , where currently the first 9 results are totally irrelevant. Though one example hardly proves a pattern, I am sure any librarian familiar with discovery services can give dozens of examples similar to this one. But of course, relevancy isn't an easy problem to solve and to be fair in this case, doing the same search without quotes actually gives you better results but still poor results.

Also as mentioned before there was in the early days doubts on how good such systems are for known item searching particularly for catalogue items and this continues to this day despite improvements.

6. Adding Federated search does not add much to web scale discovery (currently)

This is somewhat more controversial. But I believe the current consensus is moving towards the idea that tagging on federated search to web scale discovery is not that useful, at least with current implementations of this. An early debate in 2009 was sparked on the Federated Search Blog with the post Beyond Federated Search and followups, that critiqued Summon for lacking federated search, claiming that a hybrid solution of indexing what you can, and doing a broadcast search (federated search) over what you can't should be the way to go.

I could be wrong, but my impression is that many libraries that implemented Ebsco Discovery Service which does have federated search, have chosen to turn off the federated search portion, basically because it wasn't used and/or was counterproductive.

Federated Search is Dead -- and Good Riddance! , a piece explaining why James Madison University (JMU) turned off the EBSCO Integrated Search federated search add on included in EBSCO Discovery Service is perhaps a typical reaction.

Essentially the sheer size of the index of discovery services like Ebsco Discovery service or other services, means that students have no incentive to wait 30 seconds for more results, the problem they face typically is too many results, not insufficient results. Scholars will already be using traditional databases anyway as primary search tool (e.g Scopus) and may just use Web Scale discovery tools as a final round-up of what they have missed so they don't really have a dying need to see results from such traditional databases anyway.

I would say even Ebsco is downplaying the significance of the option of federated search in their EDS service, as a look at their pages on EDS does not mention federated search at all (though to be fair it's a seperate product EHIS), and there is even a page on platform blending (which I frankly don't quite understand what is going on here despite a vendor explaining it to me) where they go out of their way to state it is "not federation"

Of course, an argument could be made (correctly I think) that the idea of a hybrid system is sound but the implementation needs a lot of work to make it worth it, but currently it seems of the 4 major players in the market none seem to have cracked this issue yet and may not do so in the foreseeable future as it is perhaps not a priority.

7. Content providers are generally eager to cooperate with discovery vendors to have their content indexed.

One of the reasons why the need for federated search seems to have diminished is because more and more content is getting indexed. In 2009, there was still uncertainty on how content providers would react , would they want to be included? and discovery vendors had to work hard to get content included. If most did, then federated search would be of limited value except for reasons related to currency of results. If most couldn't be indexed, then federation would be crucially important to get at those resources.

As of 2013, the situation has clarified, over the years as more libraries started to release data showing that usage tends to fall for anything not in discovery services and or conversely anything indexed in them will lead to increased in usage, content providers have become more and more eager to be indexed or risk being cut out of the game.

The earlier mentioned James Madison University paper is perhaps instructive. Back in 2010 where he was describing the situation, of the sources, he mentioned that was accessible via federated search, by now many like JSTOR, Sage, Sciencedirect etc all are now indexed in Summon and probably other discovery services.

More interesting even A&I services like Scopus, Web of Science, MLA, ERIC are often included in many discovery services now though with appropriate safeguards to ensure their records are shown only for authenticated users.

That said, there are still hold-outs, the well known Psycinfo, EconLit etc and other A&I databases that work with Ebsco Discovery Service only is perhaps the most gaping hole currently existing.

And of course the above refers only to publishers but in general aggregator databases have been less willing (Gale seems to be a an exception here being included in Summon since 2009 and recently added to ebscohost discovery service as well as others) particularly those owned by Proquest and ebscohost are typically out of bounds to discovery services of competing services barring some special agreement.

8. Problems of broken links are still an issue though the problem is less serious and likely to be so in future
One of the greatest issues with discovery services is that they typically rely heavily on openurl to get to the full text. As is well known openurl linking is not 100% reliable, so discovery services have put in place alternate routes to full text.

For example Summon implemented "Index-Enhanced Direct Linking" and EDS has their smart links (if content is in the ebscohost databases) or custom links (I believe equivalent to Summon's index-enhanced direct linking in most cases)

That said, linking to newspaper articles, non-journal items and free content can still be iffy.

I confess it took quite a bit of effort and courage to get this piece written and posted. Sometimes I wondered if I was getting the general consensus totally wrong, and yet other times I thought what I wrote is totally trite and obvious that people knew even right at the start of 2009.

I suspect the later is more likely to be correct, because I decided to err on side of caution and list the statements I thought were definitely agreed upon and bump the ones I was unsure to a follow- up blog post "X things we still are unsure about web scale discovery systems in 2013".

But what do you think? What else is it we know about discovery services that were in doubt in 2009?