Abstract

The proliferation of the World Wide Web has brought information
retrieval (IR) techniques to the forefront of search technology. To
the average computer user, ``searching'' now means using IR-based
systems for finding information on the WWW or in other document
collections. IR query evaluation methods and workloads differ
significantly from those found in database systems. In this paper, we
focus on three such differences. First, due to the inherent fuzziness
of the natural language used in IR queries and documents, an
additional degree of flexibility is permitted in evaluating queries.
Second, IR query evaluation algorithms tend to have access patterns
that cause problems for traditional buffer replacement policies.
Third, IR search is often an iterative process, in which a query is
repeatedly refined and resubmitted by the user. Based on these
differences, we develop two complementary techniques to improve the
efficiency of IR queries: 1) Buffer-aware query evaluation, which
alters the query evaluation process based on the current contents of
buffers; and 2) Ranking-aware buffer replacement, which
incorporates knowledge of the query processing strategy into
replacement decisions. In a detailed performance study we show that
using either of these techniques yields significant performance
benefits and that in many cases, combining them produces even further
improvements.