Abstract

We study the problem of web search result diversification in the case where intent based relevance scores are available. A diversified search result will hopefully satisfy the information need of user-L.s who may have different intents. In this context, we first analyze the properties of an intent-based metric, ERR-IA, to measure relevance and diversity altogether. We argue that this is a better metric than some previously proposed intent aware metrics and show that it has a better correlation with abandonment rate. We then propose an algorithm to rerank web search results based on optimizing an objective function corresponding to this metric and evaluate it on shopping related queries.

Keywords

Web search Ranking Relevance Diversification

S. Ji and C. Liao contributed to the paper while working at Yahoo! Labs.

For CBM, the proof is a bit more involved, but it is clear intuitively that CBM has a diminishing return property: if a relevant document is placed in position i, the probability of examination in positions j > i will be low and the added value of placing another relevant document in position j is lower than if the the document in position i were not relevant. \(\square\)

We first need the following lemma:

Lemma 1

\(\forall x_i \in [0,1], \forall k\leq n,\)we have:

$$ \sum_{i=k}^n x_i \prod_{j=k}^{i-1} (1-x_j) \leq 1. $$

The lemma can easily be proved by induction: it is true for k = n and if it true for a given k, then it also true for k − 1:

We have \(f_{CBM}(x_1,\dots,x_n) = \sum_k \varphi(k) x_k \prod_{j=1}^{k-1} (1-x_j)\). We ignore for simplicity the function \(\mathcal{R}\): since it is a monotonic function, it will not affect the sign of the second derivatives. Let us fix two ranks p < q:

The first inequality hold because \(\varphi\) is decreasing while the second comes from applying the lemma with k = q + 1.

Proof of proposition 4

Let us assume, without any loss of generality, that the first topic is the most likely one. Then, let us consider the ranking consisting of n relevant documents for that topic. The value for AP will be 1 for that topic and 0 for the other topics (because we assumed that a document is relevant to at most one topic). And thus the value of AP-IA is p1. We will now show that the value of any other ranking is less or equal to p1.

Let rji be the relevance values for an arbitrary ranking and let ri denote the vector \((r_1^i,\dots,r_n^i)\). Because fAP is supermodular, we can apply several times the reverse inequality from the definition (6) and get:

Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 335–336).Google Scholar

Chen, H., & Karger, D. R. (2006) Less is more: Probabilistic models for retrieving fewer relevant documents. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 429–436).Google Scholar

Wang, J., & Zhu, J. (2009). Portfolio theory of information retrieval. In Proceedings of the 32th annual international ACM SIGIR conference on research and development in informaion retrieval (pp. 115–122).Google Scholar