inverted index for a
collection of documents is a structure that stores, for each term (word)
occurring somewhere in the collection, information about the locations where it
occurs. In particular, for each term t, the index contains an inverted list It
consisting of a number of index postings. Each posting in It contains
information about the occurrences of t in one particular document d, usually
the ID of the document (the docID), the number of occurrences of t in d (the
frequency), and possibly other information about the locations of the
occurrences within the document and their contexts. The postings in each list
are usually sorted by docID.

Search engines come in a number of configurations that
reflect the applications they are designed for.Web search engines, such as Google
and Yahoo! must be able to capture, or crawl, many tera bytes of data, and then
provide subsecond response times to millions of queries submitted everyday from
around the world. The “big issues” in the design of search engines include the ones
identified for information retrieval: eﬀective ranking algorithms, evaluation, and
user interaction. There are, however, a number of additional critical features of
search engines that result from their deployment in large-scale, operational environments.
Foremost among these features is the performance of the search engine in terms of
measures such as response time, query throughput, and indexing speed.Response
time is the delay between submitting a query and receiving the result list,throughput
measures the number of queries that can be processed in a given time, and indexing
speed is the rate at which text documents can be transformed into indexes for searching
. An index is a data structure that improves the speed of search.The design of indexes
for search engines is one of the major topics in this blog.