the enterprise search and findability blog by Findwise

Main menu

Tag Archives: Dublin

How to solve diversity in information retrieval and techniques for handling ambiguous queries was a topic of interest at the SIGIR 2013 conference in Dublin, Ireland, which I attended recently.

The issue of Diversity in Information Retrieval was covered at a number of presentations at the conference. It is search engine independent, since it uses only the set of result documents as input. When applied to the world of search it basically means an aim to produce a search result that covers as many of the relevant topics as possible.

This is done by retrieving, say 100-500 documents, instead of the normal 10.
These documents are then clustered based on their contents to create a number
of topic clusters. The search result is then constructed by selecting
(the normal 10) documents from the clusters in a round-robin fashion. This will
hopefully create a diverse search result, with as broad coverage as possible.

The technique can not only be used to solve the problem of ambiguous queries,
but also queries with several sub-topics associated with it. By iteratively
running a clustering algorithm on the result documents with 2 to 5 (or so)
clusters and measuring the separation between them and choosing the outcome
with the greatest separation, a diverse result set of documents can be created.
The clusters can also be used to ask follow up questions to the user, where
he/she is allowed to click on one of several tag clouds, containing the most
central terms of each cluster.

A cluster set of size 2 with a good separation would indicate that the query
may be ambiguous, with two different semantics meanings, while a size of 3-5
likely means that the there are a number of sub topics identified in the
results. In a way these clusters can be seen as a dynamic facet, but it is
still shallow since it only operates on the returned documents. Yet, it does
not require any additional knowledge about the documents other than the
information that is returned. This could also be extended by using topic
labelling to present the user with a single term or phrase, instead of a tag
cloud.

Regarding the conference itself I found it to be a nice and professional arrangement with lots of in depth topics and nice evening activities, including a historical tour of Dublin.

The European Conference on Information Retrieval (ECIR) 2011 took place in Dublin last week, 18-21 April. In this blogpost I would try to highlight some of the papers and talks from the conference which caught my attention and back it up with what other attendees said about it.

First, I was intrigued by the session on evaluation for IR and especially the topic of Croudsourcing. In my opition, the paper A Methodology for Evaluating Aggregated Search Results, which also got the prize for best student paper, was among the most pedagogically presented ones. It deals with the task of incorporating search results from a number of different sources, called verticals, into Web search results. By using a small number of human judgements for a given query the authors present the way to evaluate any possible permutation of verticals in the result presentation. I think that this methodology should be adopted in the world of Enterprise search, since it is exactly there where we crawl, index and present information from a number of different sources – Web, databases, fileshares, etc. The prerequisites are really minimal and low cost but the return value, the user experience, seems quite high.

Amazon Mechanical Turk, or the Artificial Artificial Intelligence, which is the marketplace for Croudsourcing, provides a way for a ridiculously small sum of money to perform evaluation, relevance assessment or any task for which you would need humans to give you some judgements. Leaving aside ethical issues, two papers in the conference presented ways of how you can utilize this service for some IR tasks.

Evgeniy Gabrilovich from Yahoo! Research, who won the Karen Sparck Jones award for 2010, gave a very interesting keynote talk on Computational Advertising. Up to now, it has never struck me how hard advertising in Information Retrieval systems is actually. I liked one of his points on the future of Ads – by using product feeds, one can automatically create product description via Text Summarization and Natural Language Generation and index this, thus avoiding bid words.

Another interesting and very pedagogically presented paper was about the gensim package by Radim Řehůřek. I definitely think we can use it in some of our projects. In general, text categorization and IR for social network were the dominant tracks. In one of the social networks tracks, Oscar Täckström presented a neat way of discovering fine-grained sentiment where some coarse-grained supervision is available. It really hooked me on trying it for any of our customers where sentiment analysis is required.

Thorsten Joachims, the last of the keynote speakers, gave a very inspiring talk on The Value of User Feedback. He put forward the idea of designing retrieval systems for feedback. In stead of just looking at the clicklogs post factum one can think of a system which uses the clicks feedback to learn, thus creating a better ranker for a given query and a given user need. In a single session, we can use click feedback to disambiguate the query and deliver results on the run which are of immediate benefit to the users.

Unfortunately, I guess I could have missed other interesting presentations but with two parallel sessions and several workshops there was a limit to what I could devour. What surprised me though, was that there were very few papers by the industry. We do try to solve exactly the same problems and tackle the same issues as academia. We, at Findwise, have constantly flagged the huge benefit of good, relevant Metadata for the task of achieving better search performace, which was also touched upon in the paper “Topic Classification in Social Media using Metadata from Hyperlinked Objects”.

It was really great to visit Dublin and attent ECIR 2011. It was an inspiring conference and I do believe that at next ECIR we, from Findwise, can be on the podium, sharing our knowledge and hands-on experience on Enterprise search and IR.

During 2011 a large number of search conferences will take place all over the world. Some of them are dedicated to search, whereas others discuss the topic related to specific products, information management, usability etc.

Here are a few that might be of interest for those of you looking to be inspired and broaden your knowledge. Within a few weeks we will compile all the research related conferences – there are quite a few of them out there!
If there is anything you miss, please post a comment.

Webcoast
Main focus: A web event that is an unconference, meaning that the attendees themselves create the program by presenting on topics of their own expertise and interest.
March 18-20 , Gothenburg, Sweden