2.2.3.1.3. Local index

Clustered implementation with local indexes is built upon same
strategy with volatile in-memory index buffer along with delayed flushing
on persistent storage.

As this implementation designed for clustered environment, it has
additional mechanisms for data delivery within cluster. Actual text
extraction jobs are done on the same node that does content operations (for example:
write operation). Prepared "documents" (Lucene term that means block of
data ready for indexing) are replicated within cluster nodes and
processed by local indexes. So each cluster instance has the same index
content. When new node joins the cluster, it has no initial index, so it
must be created. There are some supported ways of doing this operation.
The simplest is to simply copy the index manually but this is not intended
for use. If no initial index is found, JCR will use the automated scenarios. They are
controlled via configuration (see the index-recovery-mode parameter)
offering full re-indexing from database or copying from another cluster
node.

To use cluster-ready strategy based on local indexes,
the following configuration must be applied when each node has its own copy of index on local file system.
Indexing directory must point to any folder on local file system and "changesfilter-class"
must be set to
"org.exoplatform.services.jcr.impl.core.query.jbosscache.LocalIndexChangesFilter".

Common usecase for all cluster-ready applications is a hot
joining and leaving of processing units. All nodes that are joining cluster
for the first time or after some downtime must
be in a synchronized state.

When having a deal with shared value
storages, databases and indexes, cluster nodes are synchronized
anytime. However it is an issue when local index strategy is used. If the new node
joins cluster having no index, it will be retrieved or recreated. Node can
be restarted also and thus index is not empty. Usually existing index is
thought to be actual, but can be outdated.

JCR offers a mechanism called RecoveryFilters that will automatically retrieve index for the
joining node on startup. This feature is a set of filters that can be
defined via QueryHandler configuration:

If any one fires, the index is re-synchronized. Please take in
account that DocNumberRecoveryFilter is used in cases no filter is
configured. So, if resynchronization should be blocked or strictly
required on start, then ConfigurationPropertyRecoveryFilter can be
used.

This feature uses the standard index recovery mode defined by
previously described parameter (can be "from-indexing" or
"from-coordinator" (default value)).

<propertyname="index-recovery-mode"value="from-coordinator"/>

There are couple implementations of filters:

org.exoplatform.services.jcr.impl.core.query.lucene.DummyRecoveryFilter: Always
return true, for cases when index must be force
resynchronized (recovered) each time;

org.exoplatform.services.jcr.impl.core.query.lucene.SystemPropertyRecoveryFilter:
Return value of system property
"org.exoplatform.jcr.recoveryfilter.forcereindexing". So index
recovery can be controlled from the top without changing
documentation using system properties;

org.exoplatform.services.jcr.impl.core.query.lucene.ConfigurationPropertyRecoveryFilter: Return
value of QueryHandler configuration property
"index-recovery-filter-forcereindexing" so the index recovery can be
controlled from configuration separately for each workspace.
For example:

org.exoplatform.services.jcr.impl.core.query.lucene.DocNumberRecoveryFilter: Check
number of documents in index on coordinator side and
self-side and return true if differs. Advantage of this filter
comparing to other is it will skip reindexing for workspaces where
index was not modified. For example, there are 10 repositories with 3
workspaces in each one. Only one is really heavily used in cluster: frontend/production. So using this filter
will only re-index
those workspaces that are really changed without affecting other
indexes thus greatly reduce the startup time.