This directory closely mimicks the same semantics of the traditional filesystem and RAM-based directories, being able to work as a drop-in replacement for existing applications using Lucene and providing reliable index sharing and other features of Infinispan like node autodiscovery, automatic failover and rebalancing, optionally transactions, and can be backed by traditional storage solutions as filesystem, databases or cloud store engines.

The implementation extends Lucene's org.apache.lucene.store.Directory so it can be used to store the index in a cluster-wide shared memory, making it easy to distribute the index. Compared to rsync-based replication this solution is suited for use cases in which your application makes frequent changes to the index and you need them to be quickly distributed to all nodes, having configurable consistency levels, synchronicity and guarantees, total elasticity and autodiscovery; also changes applied to the index can optionally participate in a JTA transaction; since version 5 supporting XA transactions with recovery.

Two different LockFactory implementations are provided to guarantee only one IndexWriter at a time will make changes to the index, again implementing the same semantics as when opening an index on a local filesystem. As with other Lucene Directories, you can override the LockFactory if you prefer to use an alternative implementation.

Lucene compatibility

Current version was developed and compiled against Lucene 3.5.0, and also tested to work with Lucene versions from 3.0.x to 3.4.0, version 2.9.x, and the older 2.4.1.

How to use it

To create a Directory instance:

The indexName is a unique key to identify your index. It takes the same role as the path did on filesystem based indexes: you can create several different indexes giving them different names. When you use the same indexName in another instance connected to the same network (or instantiated on the same machine, useful for testing) they will join, form a cluster and share all content.

New nodes can be added or removed dynamically, making the service administration very easy and also suited for cloud environments: it's simple to react to load spikes, as adding more memory and CPU power to the search system is done by just starting more nodes.

Limitations

As when using an IndexWriter on a filesystem based Directory , even on the clustered edition only one IndexWriter can be opened across the whole cluster. Hibernate Search, which includes integration with this Lucene Directory since version 3.3, sends index change requests across a JMS queue, or a JGroups channel. Other valid approaches are to proxy the remote IndexWriter or just design your application in such a way that only one node attempts to write it. Reading (searching) is of course possible in parallel, from any number of threads on each node; changes applied to the single IndexWriter are affecting results of all threads on all nodes in a very short time.

Configuration

This works with local only configurations and also with any clustering mode supported by Infinispan. A transaction manager is not mandatory, while batching needs to be enabled. An example configuration:

As better explained in the javadocs of org.infinispan.lucene.InfinispanDirectory , it's possible for it to use more than a single cache, using specific configurations for different purposes. When using readlocks, make sure to not enable transactions on this cache.

Demo

There is a simple command-line demo of it's capabilities distributed with Infinispan under demos/lucene-directory; make sure you grab the "Binaries, server and demos" package from download page, which contains all demos.

Start several instances, then try adding text in one instance and searching for it on the other. The configuration is not tuned at all, but should work out-of-the box without any changes. If your network interface has multicast enabled, it will cluster across the local network with other instances of the demo.

Maven dependencies

All you need is org.infinispan:infinispan-lucene-directory :

In early versions the Infinispan Lucene Directory was needing a transaction manager, this is no longer needed but it's still optionally possible to use one to wrap all changes you make to the index in a transaction.

Using a CacheLoader

Using a CacheLoader you can have the index content backed up to a permanent storage; you can use a shared store for all nodes or one per node, see CacheLoaders for more details.
When using a CacheLoader to store a Lucene index, to get best write performance you would need to configure the CacheLoader with async=true.

Storing the index in a database

It might be useful to store the Lucene index in a relational database; this would be very slow but Infinispan can act as an efficient cache between the application and the JDBC interface, making this configuration useful in both clustered and non-clustered configurations.
When storing indexes in a JDBC database, it's suggested to use the JdbcStringBasedCacheStore, which will need this attribute: