Abstract

Let \({\cal D}\) be a collection of string documents of n characters in total. The top-k document retrieval problem is to preprocess \({\cal D}\) into a data structure that, given a query (P,k), can return the k documents of \({\cal D}\) most relevant to pattern P. The relevance of a document d for a pattern P is given by a predefined ranking function w(P,d). Linear space and optimal query time solutions already exist for this problem.

In this paper we consider a novel problem, document selection queries, which aim to report the kth document most relevant to P (instead of reporting all top-k documents). We present a data structure using O(n logεn) space, for any constant ε > 0, answering selection queries in time O(logk / loglogn), and a linear-space data structure answering queries in time O(logk), given the locus node of P in a (generalized) suffix tree of \({\cal D}\). We also prove that it is unlikely that a succinct-space solution for this problem exists with poly-logarithmic query time.