A probability ranking principle for interactive information retrieval

Abstract

The classical Probability Ranking Principle (PRP) forms the theoretical basis for probabilistic Information Retrieval (IR) models, which are dominating IR theory since about 20 years. However, the assumptions underlying the PRP often do not hold, and its view is too narrow for interactive information retrieval (IIR). In this article, a new theoretical framework for interactive retrieval is proposed: The basic idea is that during IIR, a user moves between situations. In each situation, the system presents to the user a list of choices, about which s/he has to decide, and the first positive decision moves the user to a new situation. Each choice is associated with a number of cost and probability parameters. Based on these parameters, an optimum ordering of the choices can the derived—the PRP for IIR. The relationship of this rule to the classical PRP is described, and issues of further research are pointed out.

Keywords

Notes

Acknowledgments

I wish to thank the Glasgow IR group, especially Keith van Rijsbergen, for their hospitality and fruitful discussions when staying with them in August 2007, while I was writing this article. The suggestions by the three anonymous reviewers were very helpful in improving the initial version of this paper.

Appendix: Definition of the event space

Let S = {s0, s1, s2…} denote a (possibly infinite) set of situations. In each situation \(s_i \in S,\) we have a set of choices \(C_i=\{c_{i1}, c_{i2}, \ldots c_{i,n_i}\}\) with \(c_{ij} \in S\) , i.e. choices are a partial mapping \(c{:}\,S \times I\negmedspace N \to S.\) Our event space is situation-specific, and we make no assumptions how the event space changes when the user moves to a new situation. Let U denote all uses of our system, and \(U_i\subseteq U\) is the set of all these uses which arrive at situation si (e.g. in a Web search engine, all uses starting with the same query—provided that no additional information is available—will lead to the same situation). Now our event space is Ω = Ci × Ui. Unfortunately, we have only judgments about a subset \(J \subseteq C_i\times U_i\) of the elements of the event space—due to the fact that the user leaves the situation as soon as he accepts a choice. Associated with each element J, we have the acceptance decision of the user, which can be modelled as a relation \(A \subseteq J \subseteq C_i\times U_i.\) Furthermore, not all of the accepted choices will turn out to be right from the user’s point of view (so that he will return to situation si—which we would model as another use). Thus, right decisions are a subset \(R\subseteq A\) of the accepted ones. The probabilistic parameters we define now are all independent of the actual uses—the system knowledge about the use is implicitly represented by the actual situation. Let X denote a random variable ranging over Ω, and Z a variable ranging over Ui. Then we define \(p_{ij}=P(X\in A|X=(c_{ij},Z)\land X\in J)\) as the probability that a use in situation si will accept choice cij.

The only independence assumption we now have to make is the following: the probability of a user accepting a choice cij is independent of the choices he rejected before. In most cases, this supposition will be fairly valid (e.g. ranked list of documents, or list of expansion terms). Please note that this assumption is much weaker than that of the classical PRP, where independence of both positive and negative relevance judgments is assumed. With this presupposition, we exclude any sequence effects, i.e. changing the order of the choices being presented does not affect their probability of being accepted. More formally, the independence assumption can be written as follows:

(Here the Yk’s are random variables ranging over all the choices of the same use Z). Furthermore, let \(q_{ij}=P(X\in R|X=(c_{ij},u)\land X\in A)\) denote the probability that acceptance of this choice is not revised later.

Stirling, K. H. (1975). The effect of document ranking on retrieval system performance: A search for an optimal ranking rule. In Proceedings of the American Society for Information Science 12 (pp. 105–106).Google Scholar