Status

Abstract

Comparative evaluations of information retrieval systems are often
carried out using standard test corpora, and the sample topics and
pre-computed relevance judgments that are associated with them.
To keep experimental costs under control, partial relevance
judgments are used rather than exhaustive ones, admitting a degree
of uncertainty into the per-topic effectiveness scores being
compared.
Here we explore the design options that must be considered when
planning such an experimental evaluation, with emphasis on how
effectiveness scores are inferred from partial information.