Status

Abstract

Selective search is a distributed retrieval technique that reduces
the computational cost of large-scale information retrieval.
By partitioning the collection into topical shards, and using a
resource selection algorithm to identify a subset of shards to search,
selective search allows retrieval effectiveness to be maintained
while evaluating fewer postings, often resulting in
90+% reductions in querying cost.
However, there has been only limited attention given to the
interaction between dynamic pruning algorithms and topical
index shards.
We demonstrate that the WAND dynamic pruning algorithm is more
effective on topical index shards than it is on randomly-organized
index shards, and that the savings generated by selective search and
WAND are additive.
We also compare two methods for applying WAND to topical shards:
searching each shard with a separate top-k heap and threshold; and
sequentially passing a shared top-k heap and threshold from one
shard to the next, in the order established by a resource selection
mechanism.
Separate top-k heaps provide low query latency,
whereas a shared top-k heap provides higher throughput.