Thoughts on the intersection of user experience, search, text analytics & more

Designing Search (part 3): Keeping on track

In the previous post we looked at techniques to help us create and articulate more effective queries. From auto-complete for lookup tasks to auto-suggest for exploratory search, these simple techniques can often make the difference between success and failure.

But occasionally things do go wrong. Sometimes our information journey is more complex than we’d anticipated, and we find ourselves straying off the ideal course. Worse still, in our determination to pursue our original goal, we may overlook other, more productive directions, leaving us endlessly finessing a flawed strategy. Sometimes we are in too deep to turn around and start again.

Conversely, there are times when we may consciously decide to take a detour and explore the path less trodden. As we saw earlier, what we find along the way can change what we seek. Sometimes we find the most valuable discoveries in the most unlikely places.

However, there’s a fine line between these two outcomes: one person’s journey of serendipitous discovery can be another’s descent into confusion and disorientation. And there’s the challenge: how can we support the former, while unobtrusively repairing the latter? In this post, we’ll look at four techniques that help us keep to the right path on our information journey.

1 Did you mean

As we saw in the previous post, auto-complete and auto-suggest are two of the most effective ways to prevent spelling mistakes and typographic errors (i.e. instances where we know how to spell something, but enter it incorrectly). By completing partial queries and suggesting meaningful alternatives, they avoid the problem at source. But, inevitably, some mistakes will slip through.

Fortunately, there are a variety of coping strategies. One of the simplest is to use spell checking algorithms to compare queries against common spellings of each word. The figure below shows the results on Google for the query “expolsion”. This isn’t necessarily a ‘failed’ search as such (as it does return results), but the more common spelling “explosion” would return more a productive result set. Of course, without knowing our intent, Google can never know for sure whether this spelling was intentional, so it offers the alternative as a “Did you mean” suggestion at the top of the search results page. Interestingly, Google repeats the suggestion at the bottom of the page, but with a slightly longer wording: “Did you mean to search for”. This is a subtle clarification, but one that may reflect the user’s shift in attention at this point (from query to results).

Potential spelling mistakes are addressed by a “Did you mean” suggestion at Google

Likewise, most major online retailers apply a similar strategy for dealing with potential spelling mistakes and typographic errors. Amazon and eBay both conservatively apply Did You Mean to queries such as “guitr”, faithfully passing on the results for this query but offering the alternative as a highlighted suggestion immediately above the search results. And in Amazon’s case, the results for the corrected spelling are appended immediately below those of the original query:

Did You Mean at Amazon

Did You Mean at eBay

2. Auto-correct

Search engines may be capable of many things, but one thing they cannot do is read minds: they can never know the user’s intent. For that reason, when faced with queries like those above, it is wise to keep some distance. Offer a gentle nudge, but leave the choice with the user.

However, there are times when it seems much more apparent that a spelling mistake has occurred. In these cases, we may not know for sure what the user’s intent was, but we can be fairly certain what it wasn’t. In these instances, auto-correction may be the most appropriate response. For example, consider a query for “expolson” on Google: this time, instead of applying a Did You Mean, it is auto-corrected to “explosion”. As before, a message appears above the results (“Showing results for”), but this time, the choice has been made for us:

Auto-correct at Google

It seems that this time Google is more confident that our query was unintended. Without knowing our intent, how can it determine this? (In case you’re wondering, it’s not simply by looking for relatively low numbers of results: “expolsion” returns ~135,000 results, and “exploson” returns ~222,000, yet the latter auto-corrected while the former is not.) The answer lies in what Google researchers refer to as the “Unreasonable Effectiveness of Data”: in this instance, the collective behaviour of millions of users. By mining user data for patterns of query reformulation, Google can determine that “exploson” is more likely to be corrected than “expolsion”. Knowing this, it applies the correction for us.

In fact, Google applies the same insight to the auto-suggest function we saw in our previous post: in addition to completions based on the prefix, it also returns potential spelling. This is particularly important in a mobile context, when accurate typing on small, handheld keyboards is so much more difficult:

Query suggestions include spelling corrections on Google

These strategies make a significant difference to the experience of searching the web. However, for site search, such vast quantities of user data may not be so readily available. In this case, perhaps a simple numeric test could suffice: for zero results, apply an auto-correction; for greater than zero but less than some threshold (say 20 results), offer a Did You Mean.

3. Partial matches

The techniques of auto-correct and Did You Mean are ideal for detecting and repairing simple errors such as spelling mistakes in short queries. But the reality of keyword search is that many users over-constrain their search by entering too many keywords, rather than too few. This is particularly apparent when confronted with a zero results page: for many users, the natural reaction is to add further keywords to their query, thus compounding the problem.

In these cases, it no longer makes sense to replace the entire query in the manner of an auto-correct or Did You Mean, particularly if certain sections of it might have actually returned productive results on their own. Instead, we need a more sophisticated strategy that considers the individual keywords and can determine which particular permutations are likely to produce useful results.

Amazon provides a particularly effective implementation of this strategy. For example, a keyword search for “fender strat maple 1976 USA” finds no matching results. However, rather than returning a zero results page, Amazon returns a number of partial matches based on various keyword permutations. Moreover, by communicating the non-matching elements of the query (using strikethrough text), it gently guides us along the path to more informed query reformulation:

Partial matches at Amazon

Although conceptually simple, solving the partial match problem is non-trivial: a query with N keywords actually has N factorial permutations, of which only a fraction will return useful results. So for just the single query above, there are in principle 120 variations to consider. In addition, out of all those variations, there is only space to present results for a handful, so they need to be chosen to reflect the diversity of the matching products while avoiding duplicate results.

A similar strategy can be seen at eBay, which also finds no results for the same query we tried on Amazon. Instead of a zero results page, we see a list of the partial matches with an invitation to select one of them (or to “try the search again with fewer keywords”). These are ordered using what’s known as quorum-level ranking (Salton, 1989), which sorts results according to the number of matching keywords. In other words, products matching four keywords (such as “fender strat maple USA”) are ranked above those containing three or fewer (such as “fender strat USA”).

Partial matches using quorum-level ranking at eBay

Partial matches are a very effective way to facilitate the process of query reformulation, providing us with a clear direction to take along our information journey. Together with auto-correct and Did You Mean, they act as signposts that help us decide which of the many paths to take. But sometimes we may see something that motivates us to take a deliberate detour. Like the auto-suggest function we saw in the previous post, related searches provides us with the inspiration to embrace new ideas that we might not otherwise have considered.

4. Related searches

All the major web search engines offer support for related searches. Bing, for example, shows them in a panel to the left of the main results:

Related searches at Bing

Google, by contrast, shows them on demand (via a link in the sidebar) as a panel above the main search results. Both designs differentiate between extensions to the query and reformulations: any keywords that are not part of the original query are rendered in bold:

Related searches at Google

Apart from providing inspiration, related searches can be used to help clarify an ambiguous query. For example, query on Bing for “apple” returns results associated mainly with the computer manufacturer, but the related searches clearly indicate a number of other interpretations:

Query disambiguation via related searches at Bing

Related searches can also be used to articulate associated concepts in a taxonomy. At eBay, for example, a query for “acoustic guitar” returns a number of related searches at varying levels of specificity. These include subordinate (child) concepts, such as “yamaha acoustic guitar” and “fender acoustic guitar”, along with sibling concepts such as “electric guitar”, and superordinate (parent) concepts such as “guitar”. These taxonomic signposts offer a subtle form of guidance, helping us understand better the conceptual space to which our query belongs.

Taxonomic signposting via related searches at eBay

While related searches offer us a way to open our minds to new directions; they are not the only source of inspiration. Sometimes it is the results themselves that provide the stimulus. When we find a particularly good match for our information need, we try to find more of the same: a process that Peter Morville refers to as “pearl growing” (Morville, 2010). Sometimes the action to find more of the same is one we can directly initiate: Google’s image search, for example, offers us the opportunity to find images similar to a particular result:

Find similar images at Google

For image search, the results certainly appear impressive, with a single click returning a remarkably homogenous set of results. But that is perhaps also its biggest shortcoming: by hiding the details of the similarity calculation, the user has no control over what it returns, and cannot see why certain items are deemed similar when others are not. For this type of information need, a faceted approach may be preferable, in which the user has control over exactly which dimensions are considered as part of the similarity calculation.

While Google shows how we can actively seek similar results, sometimes we may prefer to have related content pushed to us. Recommender systems like Last.fm and Netflix rely heavily on attributes, ratings and collaborative filtering data to suggest content we’re likely to enjoy. And from just a single item in our music collection, iTunes Genius can recommend many more for us to listen to as part of a playlist:

Genius playlist creates “more like this” from a single item

Summary

Query reformulation is a key component of information seeking behaviour, and one where we benefit most from automated support. Did You Mean and auto-correct apply spell checking strategies to keep us on track. Partial matching strategies provide signposts toward more productive keyword combinations. And related searches can inspire us to consider new directions and grow our own pearls. Together, these four techniques keep us on track along our information journey.

3 Responses

Great post, Tony. Very nicely written, as usual. Seeing and reading this post brought a *huge* smile to my face because here at Sematext we’ve been thinking along the lines for a while now and have developed solutions that implement all 4 of these techniques!

Doesn’t Quorum ranking basically destroy faceted search quality, though? I’ve seen this approach used by a few search vendors and it’s not one I ever employ or recommend except as a last resort.

Why?

1. Quorum ranking is still inferior to IDF or BM25-style ranking in boolean or mode. So I’d employ IDF of some sort regardless, preferably BM25F.
2. Even with BM25, you’re still going to count all documents that match _any_ of the words in the facets. This stinks. Why? Consider this query:

“red baskets”

Say you don’t sell baskets. Then you’re just going to see “red.” In that case you’re not doing anything good, you’re just showing totally irrelevant results. 0 results would be better to some, because that user will be very confused.

I suppose for a 3 word query you could require at least 2 keywords, but I think it’s still not going to go well.

It’s worth noting that some correction problems become simpler when you can apply boolean operators and expand blindly. Oftentimes exactly one correction is correct. Suppose you have a query “a b”

a is not in your lexicon. So you expand to 3 candidates

+(a’ a” a”’) +b

… and usually only one of the candidates intersects with keyword b. This is the approach I recommend for spell-corrections. It can work for spelling corrections when the term is a word in the lexicon as well. Those are the hardest.

The same technique, then, can obviously be applied to things that are intersection fails too, and I’d prefer such a technique over jumping to boolean-or whenever 1 word queers the boolean-and query.

It’s a battle, combinatorically, but you have a much better set of facets to work with if you nail it. The function would have to be deterministic, otherwise you’ll give users random results. Most likely you’d want to cache it anyway.

[Disclaimer: I am not a professor of IR, but I’m pretty sure there is nothing really wrong up there.]

I just realized I misunderstood something, and also misattributed it [bad day! oops!]. Tony’s quorum referred to the keywords in the queries, not the keywords in the documents.

Still, I would think mixing in some factor of TF makes sense — or at least not counting stop words. The rarer words are perhaps more important, and this is the basis of IDF-based ranking systems. You also don’t need to try every combination. You could do something like partition the required words +a +b +c +d into 2 sets, and try something like this:

+(a b) +(c d)

Obvious approaches would be to take all the terms with low frequencies and put them all together in one disjunctive quantity. You could also start by removing 2 rare words, and iteratively adding that to the disjunctive quantity until you get results. This could result in (a b c d), which is what some search vendors fall to immediately, but you have a good chance of doing better.