A TaxonomyReader which retrieves stored taxonomy information from a
Directory.

Reading from the on-disk index on every method call is too slow, so this
implementation employs caching: Some methods cache recent requests and their
results, while other methods prefetch all the data into memory and then
provide answers directly from in-memory tables. See the documentation of
individual methods for comments on their performance.

WARNING: This API is experimental and might change in incompatible ways in the next release.

setCacheSize

Currently, if the given size is smaller than the current size of
a cache, it will not shrink, and rather we be limited to its current
size.

Parameters:

size - the new maximum cache size, in number of entries.

setDelimiter

public void setDelimiter(char delimiter)

setDelimiter changes the character that the taxonomy uses in its
internal storage as a delimiter between category components. Do not
use this method unless you really know what you are doing.

If you do use this method, make sure you call it before any other
methods that actually queries the taxonomy. Moreover, make sure you
always pass the same delimiter for all LuceneTaxonomyWriter and
LuceneTaxonomyReader objects you create.

getOrdinal

getOrdinal() returns the ordinal of the category given as a path.
The ordinal is the category's serial number, an integer which starts
with 0 and grows as more categories are added (note that once a category
is added, it can never be deleted).

If the given category wasn't found in the taxonomy, INVALID_ORDINAL is
returned.

getParent

getParent() returns the ordinal of the parent category of the category
with the given ordinal.

When a category is specified as a path name, finding the path of its
parent is as trivial as dropping the last component of the path.
getParent() is functionally equivalent to calling getPath() on the
given ordinal, dropping the last component of the path, and then calling
getOrdinal() to get an ordinal back. However, implementations are
expected to provide a much more efficient implementation:

getParent() should be a very quick method, as it is used during the
facet aggregation process in faceted search. Implementations will most
likely want to serve replies to this method from a pre-filled cache.

If the given ordinal is the ROOT_ORDINAL, an INVALID_ORDINAL is returned.
If the given ordinal is a top-level category, the ROOT_ORDINAL is returned.
If an invalid ordinal is given (negative or beyond the last available
ordinal), an ArrayIndexOutOfBoundsException is thrown. However, it is
expected that getParent will only be called for ordinals which are
already known to be in the taxonomy.

getParentArray

public int[] getParentArray()

getParentArray() returns an int array of size getSize() listing the
ordinal of the parent category of each category in the taxonomy.

The caller can hold on to the array it got indefinitely - it is
guaranteed that no-one else will modify it. The other side of the
same coin is that the caller must treat the array it got as read-only
and not modify it, because other callers might have gotten the
same array too, and getParent() calls are also answered from the
same array.

The getParentArray() call is extremely efficient, merely returning
a reference to an array that already exists. For a caller that plans
to call getParent() for many categories, using getParentArray() and
the array it returns is a somewhat faster approach because it avoids
the overhead of method calls and volatile dereferencing.

If you use getParentArray() instead of getParent(), remember that
the array you got is (naturally) not modified after a refresh(),
so you should always call getParentArray() again after a refresh().

refresh

refresh() re-reads the taxonomy information if there were any changes to
the taxonomy since this instance was opened or last refreshed. Calling
refresh() is more efficient than close()ing the old instance and opening a
new one.

If there were no changes since this instance was opened or last refreshed,
then this call does nothing. Note, however, that this is still a relatively
slow method (as it needs to verify whether there have been any changes on
disk to the taxonomy), so it should not be called too often needlessly. In
faceted search, the taxonomy reader's refresh() should be called only after
a reopen() of the main index.

Refreshing the taxonomy might fail in some cases, for example
if the taxonomy was recreated since this instance was opened or last refreshed.
In this case an InconsistentTaxonomyException is thrown,
suggesting that in order to obtain up-to-date taxonomy data a new
TaxonomyReader should be opened. Note: This TaxonomyReader
instance remains unchanged and usable in this case, and the application can
continue to use it, and should still Closeable.close() when no longer needed.

It should be noted that refresh() is similar in purpose to
IndexReader.reopen(), but the two methods behave differently. refresh()
refreshes the existing TaxonomyReader object, rather than opening a new one
in addition to the old one as reopen() does. The reason is that in a
taxonomy, one can only add new categories and cannot modify or delete
existing categories; Therefore, there is no reason to keep an old snapshot
of the taxonomy open - refreshing the taxonomy to the newest data and using
this new snapshots in all threads (whether new or old) is fine. This saves
us needing to keep multiple copies of the taxonomy open in memory.

getSize

Because categories are numbered consecutively starting with 0, it
means the taxonomy contains ordinals 0 through getSize()-1.

Note that the number returned by getSize() is often slightly higher
than the number of categories inserted into the taxonomy; This is
because when a category is added to the taxonomy, its ancestors
are also added automatically (including the root, which always get
ordinal 0).

getChildrenArrays

getChildrenArrays() returns a TaxonomyReader.ChildrenArrays object which can
be used together to efficiently enumerate the children of any category.

The caller can hold on to the object it got indefinitely - it is
guaranteed that no-one else will modify it. The other side of the
same coin is that the caller must treat the object which it got (and
the arrays it contains) as read-only and not modify it, because
other callers might have gotten the same object too.

Implementations should have O(getSize()) time for the first call or
after a refresh(), but O(1) time for further calls. In neither case
there should be a need to read new data from disk. These guarantees
are most likely achieved by calculating this object (based on the
getParentArray()) when first needed, and later (if the taxonomy was not
refreshed) returning the same object (without any allocation or copying)
when requested.

The reason we have one method returning one object, rather than two
methods returning two arrays, is to avoid race conditions in a multi-
threaded application: We want to avoid the possibility of returning one
new array and one old array, as those could not be used together.

getRefCount

incRef

public void incRef()

Expert: increments the refCount of this TaxonomyReader instance.
RefCounts are used to determine when a taxonomy reader can be closed
safely, i.e. as soon as there are no more references.
Be sure to always call a corresponding decRef(), in a finally clause;
otherwise the reader may never be closed.