Tuesday, March 16, 2010

Designing search for re-finding

A WSDM 2010 paper out of Microsoft Research, "Large Scale Query Log Analysis of Re-finding" (PDF), is notable not so much for the statistics on how much people search again for what they have searched for before, but for its fascinating list of suggestions (in Section 6, "Design Implications") of what search engines should do to support re-finding.

An extended excerpt from that section:

The most obvious way that a search tool can improve the user experience given the prevalence of re-finding is for the tool to explicitly remember and expose that user's search history.

Certain aspects of a person's history may be more useful to expose ... For example, results that are re-found often may be worth highlighting ... The result that is clicked first ... is more likely to be useful later, and thus should be emphasized, while results that are clicked in the middle may be worth forgetting ... to reduce clutter. Results found at the end of a query session are more likely to be re-found.

The query used to re-find a URL is often better than the query used initially to find it ... [because of] how the person has come to understand this result. [Emphasize] re-finding queries ... in the history ... The previous query may even be worth forgetting to reduce clutter.

When exposing previously found results, it is sometimes useful to label or name those results, particularly when those results are exposed as a set. Re-finding queries may make useful labels. A Web browser could even take these bookmark queries and make them into real bookmarks.

A previously found result ... may be what the person is looking for ... even when the result [normally] is not going to be returned ... For example, [if] the user's current query is a substring of a previous query, the search engine may want to suggest the results from the history that were clicked from the longer query. In contrast, queries that overlap with but are longer than previous queries may be intended to find new results.

[An] identical search [is] highly predictive of a repeat click ... [We] can treat the result specially and, for example, [take] additional screen real estate to try to meet the user's information need with that result ... [with] deep functionality like common paths and uses in [an expanded] snippet. For results that are re-found across sessions, it may make sense instead to provide the user with deep links to [some] new avenues within the result to explore.

At the beginning of a session, when people are more likely to be picking up a previous task, a search engine should provide access into history. In the middle of the session ... focus on providing access to new information or new ways to explore previously viewed results. At the end of a session ... suggest storing any valuable information that has been found for future use.

Great advice from Jaime Teevan at Microsoft Research. For more on this, please see my earlier post, "People often repeat web searches", which summarizes a 2007 paper by Teevan and others on the prevalence of re-finding.

5 comments:

Amazon has been doing something related for years: tracking the items you have browsed. It is one of the great features that set Amazon apart.

A big problem I face with Google Scholar is that often, weeks ago, I found some exciting paper, only to "lose it". And I'm left thinking... "hmmm, there is a paper about this idea somewhere, I found it once... who wrote it?... I can't remember..."

In fact, I thought about writing my own application that would track the papers I read specifically so that it could "remind me" that they are related to what I'm doing right now.

Infoaxe is not a new idea, AM. See, for an early example, "Stuff I've Seen", also out of Microsoft Research.

But searching over a cached copy of your browsing history also was a the seed of a few startups a decade ago in the dot-com boom, as well as being a common feature in the many desktop search engines back when that was a hot idea several years ago. And, of course, there are the gazillion web bookmark services out there, all of which do things fairly similar to Infoaxe.

The hard part with this idea, beyond that the space is extremely crowded with also-rans, is getting people to download something. That's quite a hurdle. Needs a lot of obvious value to make it worth it beyond a small initial crowd of early adopters (who will try anything once).

Great post, Greg! Thanks for sharing. (long time reader first time commenter)

Was about to suggest that second paper you mentioned in your earlier post in '07 for additional reading :) (Its possible I may have discovered that paper on your blog ;P). I found the stat that 40% of all queries are re-finding queries pretty surprising when I first came across it.

As an aside, am also a co-founder of Infoaxe (thanks for the shoutout AM! :)).

One thing we found fascinating as an application of Web History search was that it allowed people to be lazy while searching. We have a widget which displays results from your Web memory/history alongside results from Google. For eg. if I have been to your blog once before, I can just type "Greg" into Google and Infoaxe's top result will be your blog. A lot of our users use Infoaxe for this sort of implicit personalization and regionalization (for eg. instead of "siam royal palo alto" you could type in just "siam royal") being less specific than they would otherwise need to be. We were surprised since we imagined most of the utility for a Web History Search engine would be to find that arcane page that you knew you should have bookmarked. :)

In a paper I wrote I proposed something I called query cacheing - i.e., cache queries or better the results for some time so you do not have to run them again - helps to keep the load low when you spawn queries across multiple systems.