A bunch of stuff I would have emailed you about.

Menu

The High Cost Of Metasearch For Libraries

I’ve been looking seriously at metasearch/federated search products for libraries recently. After a lot of reading and a few demos I’ve got some complaints.

I’m surprised how vendors, even now, devote so much time demonstrating patron features that are neither used nor appreciated by any patrons without an MLS. Recent lessons (one, two, three) should have made it clear that libraries need to conform to patron expectations of how online resources should work. Our own search statistics show that only 0.0067% (YES, less than a hundredth of a percent!) of the searches on our OPAC get “limited” to specific languages, locations, dates, or material types. What our patrons expect is that a natural language search will yield relevant results in the first page of hits. “Googlization” isn’t about dumbing things down, it’s about making the technology smarter.

And that’s the problem with these vendor’s metasearch products. They don’t do much to improve the quality of the results retrieved from any database. Shovelware, products that pile up junk in an attempt to generate value based on quantity, is poor solution for libraries or researchers. Still, that’s how these products work, and it’s how they’ll continue to work until libraries and their database providers adopt some of the advances in search technology now used on the web (it’s not just Google, but Yahoo, Teoma, Clusty, and others).

At the same time these metasearch products are doing little to improve the results we get, they’re also making the search process slower. Why do they all make us wait while slowly updating a table that shows only the number of hits retrieved from each database? A9 can teach all these vendors quite a few lessons on that point. A9 reports results in resizable columns, and fills in the details from various databases as they become available. The biggest lesson A9 can teach these vendors, however, is that metasearch should be free. They’re pushing OpenSearch as a public standard based on RSS/XML and already they’ve got access to 236 databases. That’s not bad compared to z39.50 (which we all still respect as the elder parent of current search standards), but remember that the standard was only announced in March 2005.

We need to pressure database vendors to improve their search engines and give better results. Maybe database providers need to rank journal articles by the number of citations they receive? Maybe libraries need to buy Google Search Appliances and do their own indexing of database content. That way, links from university faculty would increase the rank of articles they link to, making search results especially relevant.

Then, we need to ask where our money is going when we buy software like this. We need to demand standards-based products with outstanding ease of use. Go try out A9 and compare it to anything in your library. Yeah, don’t you wish you could offer that to your patrons?

Unfortunately, vendors don’t sell to patrons, but to librarians. They have to add the full Boolean bells and whistles to get past the all-librarian screening committee. Most are senior staff, raised on Dialog.

Our library system is using a mix of tools: MetaFind from Innovative Interfaces as the “connector”, but running our own front-end (in PHP) that queries other databases thru MetaFind. Our relevance rank tweaked Metafind’s “relevance ranking” (which was, “10 results from whatever database answered first, 10 results from the second, etc) to ignoring the response time of each database (but preserving each databaseÂ´s original “relevance” rank. However, it turns out that most users didnÂ´t like the relevance algorithm for each individual database… it didn’t seem google-like enough. Brittanica ignores phrase matching as “higher relevance”, ProQuest seems to put articles higher based solely on term occurence… so taking whatever (limited) results came back from each metasearch, we now implemented an algorightm with high scent (pushing results up based on term occurence, phrase occurence, and newer publication date) which seems to be working better (I guess we need more studies to find the right scoring weights). I know the vendors won’t get to this soon enough, so why not tweak our metasearch engines like this in the meantime? BTW, if you want to look at this algorithm tweak, look at our results (sorry, it’s in spanish!)…. Link to bad automatic translation. =)