Thursday, March 08, 2007

Personalized search at MSR TechFest

Microsoft says it still believes that it will eventually turn the tables [on Google] by improving the quality of its search results and by changing the way computer users search.

Search in the future will look nothing like today's simple search engine interfaces, [Susan Dumais] said, adding, "If in 10 years we are still using a rectangular box and a list of results, I should be fired."

John briefly mentioned efforts by Susan Dumais and others on personalized search in his article.

Mary Jo Foley also covered the event. An excerpt on personalized search:

I also enjoyed reading Don Dodge's thoughts on TechFest. An excerpt on personalization:

By building an index of documents, emails, and previous searches it is possible to create a personal profile that will help filter and rank search results for better relevance.

This is an artificial intelligence system that learns your interests and preferences, and constantly updates its algorithm based on your choices.

In this way it is not necessary for the user to change their behavior or search style in order to improve results.

It was not entirely clear to me from the coverage, but the personalized search based on desktop files sounds a lot like the 2005 work by Jaime Teevan, Susan Dumais, and Eric Horvitz (discussed in this old August 2005 post). I wonder what was different about the work that was demoed this year.

See also my April 2006 post, "Using the desktop to improve search", which discusses how several projects at Microsoft Research may be able to be "combined, refined, finished, and moved into the Windows desktop" to create "an experience impossible to reproduce in a web browser, a jump beyond the 1994, one-box search interface we still live with today."

Search in the future will look nothing like today's simple search engine interfaces, [Susan Dumais] said, adding, "If in 10 years we are still using a rectangular box and a list of results, I should be fired."

To me, this does not sound like personalized search. This sounds like "tools-based" search, i.e. the type of search where you have much more interactive capability to specify and accept or reject what it is you are looking for. Vivisimo and Quintura are good early examples of going beyond the rectangular box and the list of results.

Or do you see this comment more as a personalization-driven comment? If so, how? Because isn't personalization, at least as Google is envisioning it, still all about the rectangular box and the list of results? It's just that the list gets filled in a slightly different manner. But the form itself, the list, does not change.

I have noted your skepticism of personalization as a technology that can bring about significant improvements in search quality. As a believer in the power of personalization I would urge you to not give up on it merely because of what it is now, but consider what it can become as search engines better understand how to use it and other technological/social/legal constraints around it ease up. A historic parallel is the advent of the first rifles that were single-load and and if you missed with your first bullet, your enemy might take you down even with a knife/sword in the time that you reloaded. That didn't prevent people from persisting in improving rifle/gun technology to a point where its supremacy over swords/knives/daggers was beyond question.

A couple points about your comments on personalization:- Many people think of personalization an as add on layer to search ranking. In other words, personalized rankings are a re-ordering of the "non-personalized" rankings based on user behavior history. And thats it. I believe search rankings is only one of the dimensions along which you can personalize. In its full blown form (and I concede we are not even close to it), personalization should also personalize the user interface - some people might prefer the rectangular search box, others might prefer other tools, and this can be learnt by doing A/B tests; you could personalize the font, the number of results per page, and other components of the UI. So the way I see it, a personalized user experience is a very multi-dimensional problem, not at all restricted to ordering of search results.- I believe you'd said in a previous post how Google's personalization for you was not only non-productive, it was counter-productive. This might be a bit extreme, but personalization at its best should also be able to tell which user would like personalized resuls (and when), and turn it off in the case when it is counter-productive.

To start off, a simple UI change that Google could make is to have a couple radio buttons right below the search box saying "Personalized results" and "Non-personlized results" just the way they have "Results from the entire web" and "Results from UK". Eventually ofcourse this can be learnt from the behavior of millions of users so that the button would go away and be replaced by an implicit understanding of when to turn it off.

First, let me say thank you for your thoughtful, long response. I've been trying to engage more people in interesting discussions on this topic, and so far not many takers. It's not fair of me to constantly be directing my ire ;-) toward Greg, so I very much appreciate hearing your ideas on the matter.

I have a few continued thoughts.

(1) Yes, you are absolutely right about being in the early stages of personalization. But, then again, couldn't you say the same thing about the "tools" based approach (e.g. Vivisimo, Quintura, etc.)? Aren't we in the early stages of those ideas, too? And yet I routinely hear folks from the major search engines talk about how none of these ideas will work, because of how lazy users are. Why the outright rejection of a whole rich area of information retrieval development? Where is the faith that people place in users to adapt and grow? I think about my "lazy" 80 year old grandmother, and how ten years ago she didn't even know how to turn on a computer. Now she surfs and chats with the best of them. Users can and do learn tools that are available to them. Chalking it up to user laziness is a cop-out. I don't think it is laziness, so much as it is inexperience, and correct contextual integration. I have heard Google's Marissa Mayer talk about how nobody used their spelling correction link.. until they replicated it at the bottom of the ten results. Because it was at that point, once the user scanned through a set of ten non-relevant links, that they realized "oh, I spelled it wrong", and then realized they need to correct their spelling. When the Google correction tool used to only be at the top, users wouldn't see it. But at the bottom, users saw it. It was contextually the right tool in the right place at the right time. So why is there no faith in the ability of Vivisimo-like and Quintura-like assistance to grow and become better, helping users understand how to correctly use the tools that are available to them? Why is there no faith in this, but instead lots of faith in personalization?

(2) Regarding the characterization of personalization as search result reordering: Yes, that is pretty much what I understand it to mean. You do have some interesting ideas around automatically learning user behavior, and then "customizing"/personalizing other things like number of results to show, additional tools to offer, etc. But for the moment, I would like to leave this extension of personalization aside. Why? Because it starts to muddy the issue of what we're actually discussing. For example, when you say that personalization might actually decide to show some users tools, you all of a sudden get this chicken-egg problem. How did you know that person wanted tools.. unless you had first shown them some tools, and they used the tools. But if you showed them tools first, before you made the personalization decision to show them tools, then that in itself is a tool that the user can use to decide whether or not they want to see tools. This is getting confusing. I'd rather just stick to the basic issue of personalization vs tools as the question of "search engine knows best" (automatic reordering, based on your personal profile) vs "user knows best" (you making the decision about what you're actually interested in, that moment, with all the best options made clear to you... even the options that you might not pick).

(3) So to me the fundamental issue is "search engine knows best" (personalization) vs. "user knows best" (tools). What I really need to hear from personalization advocates is two main things: (a) Give me an example of when personalization (ranked list reordering) is going to make a huge difference in someone's search experience, beyond the simple situation of worse sense disambiguation. Miami dolphins vs. ocean dolphins, football with a round vs oblong ball, etc. Those are toy examples, and you can already solve them without resorting to personalization. Beyond those toy examples, what is personalization really going to give you? Tell me a good story. Motivate the problem. (b) How are you going to solve the problem of personalization mistakes? Yes, I was a bit harsh in my Google example, during my time in London. It kept dumping me to these UK-centric pages. I could not get back to the US results I was expecting! But that is a serious problem, and if personalization gets something wrong, and the user has no way of correcting it, it is going to be off-puttingly frustrating for the user. But wait.. how are you going to turn it off? A radio button below the search results, letting users toggle back and forth? But what is this you are offering to the user? A tool! Bingo! We are back in the tools camp. First of all, in order for that solution to succeed, you have to assume that the user is educated enough to know and understand what is happening, and what it actually means to turn personalization on or off...and how clicking the "turn off personalization" radio button is going to solve their problem.

But if they understand enough to actually use that radio button, why not give them a different button, one that is a little more explanatory, e.g. when I search for football, instead of it saying "turn personalization on/off", there are three radio buttons, saying "(.) See U.S.-centric football results, (.) See UK-centric football results, and (.) see World-centric football results". Then, since the user knows best, he/she can pick exactly which one he she wants at that moment. Which brings me to my last point:

(4) You say: "Eventually ofcourse this can be learnt from the behavior of millions of users so that the button would go away and be replaced by an implicit understanding of when to turn it off." This is what I have the hardest time believing. Because, you know what? I don't think there are millions of users similar enough to me, that you could use for good training data. For example, even though I was in the UK for a year, and most of my searches were for US information, I actually preferred UK versions of football. I even developed a certain fondness for my local southeast London team, Millwall. (Even learned some of the chants: "na na - na na na - na na na na - Millwall!" and "Miiiiiiihw-woooooowhl!") How many people in the world are there, simultaneously searching for income tax information for the U.S. and off-tier football information in the UK? I mean, maybe there are a lot of people searching for U.S. tax info that are also searching for big-name football teams, like Manchester United or Arsenal. But Millwall? I'd be surprised if my profile matched dozens of people, much less millions. And the thing is, I think every single one of us has "personalized" uniquenesses like this, for which it is impossible to find enough "personalization" training data. So the best solution, in my mind, remains the one in which trust is placed directly on the user, and the user is given the choice, in the very moment of need, what type of information he/she is actually searching for. I search for taxes? Let me choose, that moment, U.S. info. I'm searching for football? Let me choose that moment UK football. I'm searching for Pirates of the Carribean? Let me choose one moment the movie, and maybe three months later the Disneyland ride.

I will not rule personalization out. I will, as you ask me to do, not give up on it merely because of what it is now. I will continue watching and seeing what happens. It might indeed evolve from knives to guns to nukes (continuing your metaphor). But I will also be watching the tools, "user knows best" camp, too. Raul Valdes-Perez, of Vivisimo, has, in my mind, the most compelling arguments around this latter approach. "We're trying to move search away from this idea that ranking [Web pages] is the solution to everything," he said once in an interview. "Instead, our basic philosophy is, don't just try to show the best 10 or the best 5 pages, but instead dredge up a larger amount of stuff, the top 200 or 500, organize that quickly in half a second or so, and show the major themes to the user."

Check out this article: http://www.post-gazette.com/pg/06177/701252-96.stm

"As search engines and other digital technology have become more powerful, they have made it easier for people to find exactly what they want and dispense with all the rest. But Dr. Valdes-Perez said that by clustering Web pages into themes, Vivisimo can sometimes reveal connections that people wouldn't have seen otherwise. To demonstrate that, he recently used the search terms "Osama bin Laden" and "Madonna" for a group in Washington D.C. One of the themes that was generated was "niece," he said, and when he opened that folder, it revealed Web sites about a niece of the terrorist "who actually hates him but has aspirations to be a pop singer like Madonna," Dr. Valdes-Perez said.