Friday, October 31, 2008

Bruce Croft at CIKM 2008

The talk started with a bit of a pat on the back for those working on search, with Bruce saying that "search is everywhere" -- not just the web, but enterprise, desktop, product catalogs, and many other places -- and, despite hard problems of noisy data and vocabulary mismatch between what searchers actually want and how they express that to a search engine, search seems to do a pretty good job.

But then, Bruce took everyone to task, saying that current search is nothing like the "vision of the future" that was anticipated decades back. We still are nowhere near a software agent that can understand and fulfill complex information needs like a human expert would. Bruce said the "hard problems remain very hard" and that search really only works well when searchers are easily able to translate their goals into "those little keywords".

To get to that vision, Bruce argued that we need to be "evolutionary, not revolutionary". Keep chipping away at the problem. Bruce suggested long queries as a particularly promising "next step toward the vision of the future", saying that long queries work well "for people [not] for search engines", and discussed a few approaches using techniques similar to statistical machine translation.

It may have been that he ran short on time, but I was disappointed that Bruce did not spend more time talking about how to make progress toward the grand vision of search. A bit of this was addressed in the questions at the end. One person asked about whether search should be more of a dialogue between the searcher and the search engine. Another asked about user interface innovations that might be necessary. But, in general, it would have been interesting to hear more about what new paths Bruce considers promising and which of the techniques currently used he considers to be dead ends.

On a side note, there was a curious contrast between Bruce's approach of "evolutionary not revolutionary" and the "Mountains or the Street Lamp" talk during industry day. In that talk, Chris Burges argued that we should primarily focus on very hard problems we have no idea how to solve -- climb the mountains -- not twiddles to existing techniques -- look around nearby in the light under the street lamp.

Hi, Daniel. Bruce did give examples of "typical web search issues" -- including scale, spam, ads, coverage, freshness, evaluation, query processing, the social aspects of relevance, user intent, document structure and heterogeneity in document structure, important features in documents and user behavior, long queries, and tasks and the context of tasks -- when talking about making incremental progress.

But, other than implying that we need to learn more about how to make progress in order to make progress, no, I don't remember that he offered much justification. It would have been interesting to have more of a discussion on that point.

The talk should be on videolectures.net fairly soon. I am fairly sure the Q&A was recorded as well as the main talk, so you should be able to catch his answers to those questions if you want the details.

One person asked about whether search should be more of a dialogue between the searcher and the search engine.

I was the guy who asked the "dialogue" question. And basically, Bruce's answer was, "No, we should elicit, or get the user to express, the full long query at the very beginning. Rather than do it iteratively. Because the longer query ultimately gives better results."

And it's true.. longer queries do give better results. And I think more search engine interfaces should be designed to elicit longer queries.

But what I think his answer ignored was the fact that often the user doesn't actually know what it is they are looking for, until they start to find it, or similar things. In that case, the user cannot enter a longer query, and must successively iterate. But it was a keynote talk, and I couldn't really get into a long back-and-forth with him in the moment, to get that clarification. I suspect that he might have agreed that more dialogue options are necessary if the question had been posed in a different manner -- especially since earlier in the talk he also mentioned Nick Belkin's model for information seeking, which talks about not knowing what you don't know, when asking a query.