IndexReader is an abstract class, providing an interface for accessing an
index. Search of an index is done entirely through this abstract interface,
so that any subclass which implements it is searchable.

Concrete subclasses of IndexReader are usually constructed with a call to
one of the static open() methods, e.g. open(Directory, boolean).

For efficiency, in this API documents are often referred to via
document numbers, non-negative integers which each name a unique
document in the index. These document numbers are ephemeral--they may change
as documents are added to and deleted from an index. Clients should thus not
rely on a given document having the same number between sessions.

An IndexReader can be opened on a directory for which an IndexWriter is
opened already, but it cannot be used to delete documents from the index then.

NOTE: for backwards API compatibility, several methods are not listed
as abstract, but have no useful implementations in this base class and
instead always throw UnsupportedOperationException. Subclasses are
strongly encouraged to override these methods, but in many cases may not
need to.

NOTE: as of 2.4, it's possible to open a read-only
IndexReader using the static open methods that accept the
boolean readOnly parameter. Such a reader has better
concurrency as it's not necessary to synchronize on the
isDeleted method. You must specify false if you want to
make changes with the resulting IndexReader.

NOTE: IndexReader instances are completely thread
safe, meaning multiple threads can call any of its methods,
concurrently. If your application requires external
synchronization, you should not synchronize on the
IndexReader instance; use your own
(non-Lucene) objects instead.

commit()
Commit changes resulting from delete, undeleteAll, or
setNorm operations
If an exception is hit, then either no changes or all
changes will have been committed to the index
(transactional semantics).

void

commit(Map<String,String> commitUserData)
Commit changes resulting from delete, undeleteAll, or
setNorm operations
If an exception is hit, then either no changes or all
changes will have been committed to the index
(transactional semantics).

hasChanges

IndexReader

protected IndexReader()

Method Detail

getRefCount

public int getRefCount()

Expert: returns the current refCount for this reader

incRef

public void incRef()

Expert: increments the refCount of this IndexReader
instance. RefCounts are used to determine when a
reader can be closed safely, i.e. as soon as there are
no more references. Be sure to always call a
corresponding decRef(), in a finally clause;
otherwise the reader may never be closed. Note that
close() simply calls decRef(), which means that
the IndexReader will not really be closed until decRef() has been called for all outstanding
references.

open

Returns an IndexReader reading the index in the given
Directory. You should pass readOnly=true, since it
gives much better concurrent performance, unless you
intend to do write operations (delete documents or
change norms) with the reader.

Parameters:

directory - the index directory

readOnly - true if no changes (deletions, norms) will be made with this IndexReader

open

Expert: returns an IndexReader reading the index in the given
IndexCommit. You should pass readOnly=true, since it
gives much better concurrent performance, unless you
intend to do write operations (delete documents or
change norms) with the reader.

Parameters:

commit - the commit point to open

readOnly - true if no changes (deletions, norms) will be made with this IndexReader

open

Expert: returns an IndexReader reading the index in
the given Directory, with a custom IndexDeletionPolicy. You should pass readOnly=true,
since it gives much better concurrent performance,
unless you intend to do write operations (delete
documents or change norms) with the reader.

Parameters:

directory - the index directory

deletionPolicy - a custom deletion policy (only used
if you use this reader to perform deletes or to set
norms); see IndexWriter for details.

readOnly - true if no changes (deletions, norms) will be made with this IndexReader

open

Expert: returns an IndexReader reading the index in
the given Directory, with a custom IndexDeletionPolicy. You should pass readOnly=true,
since it gives much better concurrent performance,
unless you intend to do write operations (delete
documents or change norms) with the reader.

Parameters:

directory - the index directory

deletionPolicy - a custom deletion policy (only used
if you use this reader to perform deletes or to set
norms); see IndexWriter for details.

readOnly - true if no changes (deletions, norms) will be made with this IndexReader

termInfosIndexDivisor - Subsamples which indexed
terms are loaded into RAM. This has the same effect as IndexWriter.setTermIndexInterval(int) except that setting
must be done at indexing time while this setting can be
set per reader. When set to N, then one in every
N*termIndexInterval terms in the index is loaded into
memory. By setting this to a value > 1 you can reduce
memory usage, at the expense of higher latency when
loading a TermInfo. The default value is 1. Set this
to -1 to skip loading the terms index entirely.

open

Expert: returns an IndexReader reading the index in
the given Directory, using a specific commit and with
a custom IndexDeletionPolicy. You should pass
readOnly=true, since it gives much better concurrent
performance, unless you intend to do write operations
(delete documents or change norms) with the reader.

open

Expert: returns an IndexReader reading the index in
the given Directory, using a specific commit and with
a custom IndexDeletionPolicy. You should pass
readOnly=true, since it gives much better concurrent
performance, unless you intend to do write operations
(delete documents or change norms) with the reader.

deletionPolicy - a custom deletion policy (only used
if you use this reader to perform deletes or to set
norms); see IndexWriter for details.

readOnly - true if no changes (deletions, norms) will be made with this IndexReader

termInfosIndexDivisor - Subsamples which indexed
terms are loaded into RAM. This has the same effect as IndexWriter.setTermIndexInterval(int) except that setting
must be done at indexing time while this setting can be
set per reader. When set to N, then one in every
N*termIndexInterval terms in the index is loaded into
memory. By setting this to a value > 1 you can reduce
memory usage, at the expense of higher latency when
loading a TermInfo. The default value is 1. Set this
to -1 to skip loading the terms index entirely.

reopen

Refreshes an IndexReader if the index has changed since this instance
was (re)opened.

Opening an IndexReader is an expensive operation. This method can be used
to refresh an existing IndexReader to reduce these costs. This method
tries to only load segments that have changed or were created after the
IndexReader was (re)opened.

If the index has not changed since this instance was (re)opened, then this
call is a NOOP and returns this instance. Otherwise, a new instance is
returned. The old instance is not closed and remains usable.

If the reader is reopened, even though they share
resources internally, it's safe to make changes
(deletions, norms) with the new reader. All shared
mutable state obeys "copy on write" semantics to ensure
the changes are not seen by other readers.

You can determine whether a reader was actually reopened by comparing the
old instance with the instance returned by this method:

reopen

Expert: reopen this reader on a specific commit point.
This always returns a readOnly reader. If the
specified commit point matches what this reader is
already on, and this reader is already readOnly, then
this same instance is returned; if it is not already
readOnly, a readOnly clone is returned.

clone

On cloning a reader with pending changes (deletions,
norms), the original reader transfers its write lock to
the cloned reader. This means only the cloned reader
may make further changes to the index, and commit the
changes to the index on close, but the old reader still
reflects all changes made up until it was cloned.

Like reopen(), it's safe to make changes to
either the original or the cloned reader: all shared
mutable state obeys "copy on write" semantics to ensure
the changes are not seen by other readers.

directory

Returns the directory associated with this index. The Default
implementation returns the directory specified by subclasses when
delegating to the IndexReader(Directory) constructor, or throws an
UnsupportedOperationException if one was not specified.

If instead this reader is a near real-time reader
(ie, obtained by a call to IndexWriter.getReader(), or by calling reopen()
on a near real-time reader), then this method returns
the version of the last commit done by the writer.
Note that even as further changes are made with the
writer, the version will not changed until a commit is
completed. Thus, you should not rely on this method to
determine when a near real-time reader should be
opened. Use isCurrent() instead.

If instead this reader is a near real-time reader
(ie, obtained by a call to IndexWriter.getReader(), or by calling reopen()
on a near real-time reader), then this method checks if
either a new commmit has occurred, or any new
uncommitted changes have taken place via the writer.
Note that even if the writer has only performed
merging, this method will still return false.

In any event, if this returns false, you should call
reopen() to get a new reader that sees the
changes.

getTermFreqVectors

Return an array of term frequency vectors for the specified document.
The array contains a vector for each vectorized field in the document.
Each vector contains terms and frequencies for all terms in a given vectorized field.
If no such fields existed, the method returns null. The term vectors that are
returned may either be of type TermFreqVector
or of type TermPositionVector if
positions or offsets have been stored.

Parameters:

docNumber - document for which term frequency vectors are returned

Returns:

array of term frequency vectors. May be null if no term vectors have been
stored for the specified document.

getTermFreqVector

Return a term frequency vector for the specified document and field. The
returned vector contains terms and frequencies for the terms in
the specified field of this document, if the field had the storeTermVector
flag set. If termvectors had been stored with positions or offsets, a
TermPositionVector is returned.

Parameters:

docNumber - document for which the term frequency vector is returned

field - field for which the term frequency vector is returned.

Returns:

term frequency vector May be null if field does not exist in the specified
document or term vector was not stored.

document

NOTE: for performance reasons, this method does not check if the
requested document is deleted, and therefore asking for a deleted document
may yield unspecified results. Usually this is not required, however you
can call isDeleted(int) with the requested document ID to verify
the document is not deleted.

document

Get the Document at the nth position. The FieldSelector may be used to determine
what Fields to load and how they should
be loaded. NOTE: If this Reader (more specifically, the underlying
FieldsReader) is closed before the lazy
Field is loaded an exception may be
thrown. If you want the value of a lazy
Field to be available after closing you
must explicitly load it or fetch the Document again with a new loader.

NOTE: for performance reasons, this method does not check if the
requested document is deleted, and therefore asking for a deleted document
may yield unspecified results. Usually this is not required, however you
can call isDeleted(int) with the requested document ID to verify
the document is not deleted.

Parameters:

n - Get the document at the nth position

fieldSelector - The FieldSelector to use to determine what
Fields should be loaded on the Document. May be null, in which case
all Fields will be loaded.

setNorm

Expert: Resets the normalization factor for the named field of the named
document. The norm represents the product of the field's boost and its length normalization. Thus, to preserve the length normalization
values when resetting this, one should base the new value upon the old.
NOTE: If this field does not store norms, then
this method call will silently do nothing.

terms

Returns an enumeration of all the terms in the index. The
enumeration is ordered by Term.compareTo(). Each term is greater
than all that precede it in the enumeration. Note that after
calling terms(), TermEnum.next() must be called
on the resulting enumeration before calling other methods such as
TermEnum.term().

terms

Returns an enumeration of all terms starting at a given term. If
the given term does not exist, the enumeration is positioned at the
first term greater than the supplied term. The enumeration is
ordered by Term.compareTo(). Each term is greater than all that
precede it in the enumeration.

termDocs

Returns an enumeration of all the documents which contain
term. For each document, the document number, the frequency of
the term in that document is also provided, for use in
search scoring. If term is null, then all non-deleted
docs are returned with freq=1.
Thus, this method implements the mapping:

Term => <docNum, freq>*

The enumeration is ordered by document number. Each document number
is greater than all that precede it in the enumeration.

termPositions

Returns an enumeration of all the documents which contain
term. For each document, in addition to the document number
and frequency of the term in that document, a list of all of the ordinal
positions of the term in the document is available. Thus, this method
implements the mapping:

deleteDocument

Deletes the document numbered docNum. Once a document is
deleted it will not appear in TermDocs or TermPostitions enumerations.
Attempts to read its field with the document(int)
method will result in an error. The presence of this document may still be
reflected in the docFreq(org.apache.lucene.index.Term) statistic, though
this will be corrected eventually as the index is further modified.

deleteDocuments

Deletes all documents that have a given term indexed.
This is useful if one uses a document field to hold a unique ID string for
the document. Then to delete such a document, one merely constructs a
term with the appropriate field and the unique ID string as its text and
passes it to this method.
See deleteDocument(int) for information about when this deletion will
become effective.

main

Prints the filename and size of each file within a given compound file.
Add the -extract flag to extract files to the current working directory.
In order to make the extracted version of the index work, you have to copy
the segments file from the compound index into the directory where the extracted files are stored.

listCommits

Returns all commit points that exist in the Directory.
Normally, because the default is KeepOnlyLastCommitDeletionPolicy, there would be only
one commit point. But if you're using a custom IndexDeletionPolicy then there could be many commits.
Once you have a given commit, you can open a reader on
it by calling open(IndexCommit,boolean)
There must be at least one commit in
the Directory, else this method throws IOException. Note that if a commit is in
progress while this method is running, that commit
may or may not be returned array.

getSequentialSubReaders

Expert: returns the sequential sub readers that this
reader is logically composed of. For example,
IndexSearcher uses this API to drive searching by one
sub reader at a time. If this reader is not composed
of sequential child readers, it should return null.
If this method returns an empty array, that means this
reader is a null reader (for example a MultiReader
that has no sub readers).

NOTE: You should not try using sub-readers returned by
this method to make any changes (setNorm, deleteDocument,
etc.). While this might succeed for one composite reader
(like MultiReader), it will most likely lead to index
corruption for other readers (like DirectoryReader obtained
through open(org.apache.lucene.store.Directory). Use the parent reader directly.

getDeletesCacheKey

getUniqueTermCount

Returns the number of unique terms (across all fields)
in this reader.
This method returns long, even though internally
Lucene cannot handle more than 2^31 unique terms, for
a possible future when this limitation is removed.