Bio

I'm working in the Hibernate and Infinispan teams at Red Hat, caring about Lucene
integration in products we support, striving to make it easier to use and to
integrate in well known APIs and patterns, and finally to make it scale better;
I love clean and well performing code.
I've been an early adopter of cloud deployments scaling Lucene to a large number
of requests on Amazon EC2 using Hibernate Search, and after that I worked with
JIRA to make it clusterable via Infinispan.
I've lived in Holland, Italy, the Dominican Republic, Chile, Portugal and the UK;
love OSS and socializing with other developers to improve all and any OSS project.

The master branch is also very active! Expect a new Beta release of version 5.8 with support for Elasticsearch 5+ later this week.

Why ?

We backported various small fixes which should be welcome but of low impact. The big deal is HSEARCH-2691, as you might fail to notice this problem until testing under load, which is quite inconvenient.

Big thanks to Andrej Golovnin, who spotted the problem and shared a patch; I suspect it wasn’t easy to find the problem.

Also thanks to Osamu Nagano, who pointed out the importance of this fix and suggested backporting it urgently.

How to get these releases

All versions are available for download on Hibernate Search’s web site.

Ideally use a modern build tool to fetch it from Maven central; these are the coordinates:

Simplified JNDI configuration

If you integrated any external component into Hibernate Search using JNDI,
for example a JMS queue or an Infinispan cache, this configuration was simplified.

You will no longer need to set Hibernate Search specific configuration
properties such as how to set the InitialContext for JNDI lookups:
only configure Hibernate ORM, and Hibernate Search will inherit the same
settings.

Come along in Gallery Hall at 12:55 on Thursday 11th of May to see a quick
demo of Hibernate OGM used to migrate a JPA application from using a relational
database to using a fast, scalable and highly available in memory data grid.

Hot Rod support?

Hot Rod is the protocol used by "intelligent clients" of an Infinispan Server, which implies
the client is smart enough to implement a series of performance optimisation tricks; for example
it is able to connect to the most suited server in the cluster depending on the data
being requested (or written), greatly reducing the latency of operations.

While Infinispan is most known as an high-performance key/value store, when it comes to remote
clients the recommended approach is to encode your data in Google Protobuffers.
This allows for evolution of your storage schema without breaking decoding of existing
data, allows server side queries and functions to interpret the stored data, and allows
interoperability with Hot Rod clients from other programming languages.
It allows for example to read the POJOs we write using the Java client from a Python client, and
have the data converted into reasonable Python Objects.

You’ll need to write and maintain a matching Protobuf Schema for all your entities

You’ll have to setup Protostream and configure it as Marshaller for the Hot Rod client

Most importantly, it requires new skills.

You’ll have to learn how a Protobuf Schema is best defined, and how to use Protostream.
Finally, you’ll have to learn the Hot Rod API and how to make the most of its
advanced flags to tune each operation, and consider carefully how you want to represent relations.

Use something familiar instead?

Hibernate OGM can automate the tedious parts, and let you focus on what matters: storing your objects.

The most notable one is that you will have to manually create
the Cache definitions that it will need on your Infinispan Server configuration, as
at this time this is an operation which can’t be performed over Hot Rod (but the friendly
Infinispan team is working on a solution).

Another limitation is that at this time we don’t support running JPQL queries on this backend.

You can find many more interesting details in our Infinispan integration reference guide; I’ve included a section to help you choose between Infinispan Embedded
and Infinispan Remote (over Hot Rod).

I’m proud to announce our team is a bit larger nowadays, and more contributors are volunteering too, so we managed to increase the development pace. Today we release version 5.6.0.Beta3, 5.7.0.Alpha1 and 5.5.5.Final.

Version 5.6.0.Beta3

the latest version of our main development branch, with experimental Elasticsearch integration.

Version 5.7.0.Alpha1

essentially the same as 5.6.0.Beta3, but compatible with Hibernate ORM version 5.2.x.

Version 5.5.5.Final

a maintenance release of our stable branch.

A 5.7 preview released when 5.6 isn’t out yet?

Let me explain this unusual decision was taken to accomodate for the needs of you all.

The 5.6 series is creating a lot of anticipation with the Elasticsearch integration being a very welcome new feature; it’s meant to be an experimental new feature as we won’t break our APIs yet while all integration needs are analyzed, still it’s taking a bit longer than expected and even though it’s and experimental feature we don’t want to rush it and need to finish it up properly.

In the meantime the Hibernate ORM project released a series 5.2.x, and several users have been asking to get an Hibernate Search version compatible with it. We could not upgrade our 5.6 series yet, as then people using an older Hibernate ORM would not be able to play with the Elasticsearch integration.

So now that 5.6 is in good shape - we decided the next release will be a candidate release - we felt we could already publish a 5.7 version, which is just exactly the same but in a new branch made compatible with the very latest Hibernate ORM.

How is the Elasticsearch integration coming?

It’s maturing at high speed. The biggest obstacles have been resolved, so we definitely look out for more feedback at this point; as mentioned, the next version will be a candidate release.

Hibernate Search now has a proper Sorting API: watch this space as we’ll publish a dedicated blog about it, or get a peek at the
query sorting paragraph in the documentation.

This is an important milestone, as it makes sorting queries on Elasticsearch possible through our DSL.

Notes on compatibility

This version is compatible with Apache Lucene versions from 5.3.x to 5.5.x, and with Hibernate ORM versions 5.0.x and 5.1.x.

Compatibility with Hibernate ORM 5.2.x is not a reality yet - we expect to see that materialize in early October.
Compatibility with Lucene 6.x is scheduled for Hibernate Search 6.0, which will take longer - probably early 2017.

Finally, the version we used of Elasticsearch for all developing and tests of this version was Elasticsearch v. 2.3.1.
We will soon upgrade this to the latest version, and discuss strategies to test against multiple versions.

we didn’t do much performance testing, it’s probably not as efficient as it could be.

Relax the expected Elasticsearch version

it’s being tested with version 2.3.1 but we have plans to support a wider range of versions.

Explicit refresh requests

we plan to add methods to issue an indexreader refresh request, as the changes pushed to Elasticsearch are not immediately visible by default.

Your Feedback!

we think it’s in pretty good shape, it would be great for more people to try it out and let us know what is missing and how it’s working for you.

Notable differences between using embededd Lucene vs Elasticsearch

Unless you reconfigure Hibernate Search to use an async worker, by default when using the Lucene backend after you commit a transaction the changes to the index are immediately applied and any subsequent search will "see" the changes.
On Elasticsearch the default is different: changes received by the cluster are only "visible" to searches after some seconds (1 by default).

You can reconfigure Hibernate Search to force a refresh of indexes after each write operation by using the hibernate.search.default.elasticsearch.refresh_after_write configuration setting.

This setting defaults to false as that’s the recommended setting for optimal performance on Elasticsearch.
You might want to set this to true to make it simpler to write unit tests, but you should take care to not rely on the synchronous
behaviour for your production code.

Improvements for embedded Lucene users

While working on Elasticsearch, we also applied some performance improvements which apply to users of the
traditional Lucene embedded users.

Special thanks to Andrej Golovnin, who contributed several patches to reduce allocation of objects on the hot path and improve overall performance.

Having fixed several issues and tasks since the previous milestone, it’s time to publish our third milestone
towards Elasticsearch integration: Hibernate Search version 5.6.0.Alpha3 is now available!

Migration from Hibernate Search 5.5.x

Even if you’re not interested in the new Elasticsearch support, you might want to try out this version as
it benefits from Apache Lucene 5.5.0.

If you ignore the new features and want to simply use Lucene in embedded mode the migration is easy,
and as usual we are maintaining notes regarding relevant API changes in the
Migration Guide to Hibernate Search 5.6.

Elasticsearch support progress

you can now use the Analyzers from Elasticsearch

Multiple operations will now be sent to Elasticsearch as a single batch to improve both performance and consistency

Spatial indexing and querying is now feature complete

We’ll wait for Elasticsearch to be "green" before attempting to use it at boot

Many improvements in the query translation

Error capture and reporting was improved

the Massindexer is working now, but is not yet using efficient bulk operations

We also updated to use Apache Lucene 5.5 - the latest stable release of our favourite search engine -
as of course we’re not abandoning our traditional users!

What is the Hibernate Search / Elasticsearch integration?

Both Hibernate Search and Elasticsearch can do much more, but for the purpose of explaining this integration at an high level I’ll simplify their definitions as:

Hibernate Search

is a popular extension of the super popular Hibernate ORM framework, which makes it easy to index and query your entities using Apache Lucene.

Elasticsearch

is a very popular REST server which wraps the capabilities of Apache Lucene and makes it easier to scale this service horizontally.

Until today when using Hibernate Search you’d be using Apache Lucene directly, in what we will now call "embedded mode".
In this mode a query is executed by the same process of your application, and while indexing happens in background still the overhead of such
processing is happening within the same server and within the same JVM process as your Hibernate powered application.

With the Elasticsearch integration, rather than indexing your entities directly by managing the Lucene resources, we will
be sending RPCs to an Elasticsearch cluster to achieve a very similar purpose: after all it is also Lucene based, so the
feature match is extremely close!

This means that we’re able to transparently map all the current features to this new alternative backend,
and by so doing give you more architectural choices at minimum required changes in your applications:
the goal is that for most users the differences will be mostly in configuration details.

When using Elasticsearch we will need to send RPCs over the network to run queries and index updates,
but on the other hand you benefit from Microservices - style decoupling and all the nice features
that Elasticsearch can provide in terms of running and managing an horizontally scalable cluster.

Elasticsearch integration status

This is literally being developed right now, so do not expect this to be feature complete nor reliable enough to run
in a production system. Still, we already have a great set of features working so it’s a nice time to start
playing with it and hopefully provide some feedback.

Updating the indexes

As with Lucene in embedded mode, the indexes are updated automatically when you create or update
entities which are mapped to Hibernate Search using the same annotations already familiar from our
traditional index mapping (see Mapping entities to the index structure).

Running a query on an Elasticsearch mapped entity

In many cases the existing way (see Querying) of running queries should work:
we do automatically translate the most common types of Apache Lucene queries and many of the queries generated by the Hibernate Search DSL.

On top of translating Lucene queries, you can directly create Elasticsearch queries by using either its String format or a JSON format:

Remaining work ahead

This is an early preview, and while we’re proud of some of the progress there are several areas which still need much coding.
On the other hand, implementing some of these is not very hard: this might be the perfect time to join the project.

Please check with JIRA and the mailing lists for updates, but at the time of writing this at least the following features are known to not work yet:

Analyzer definitions are not being applied

Spatial queries need more work

Filters can’t be applied yet

Faceting is mostly implemented

Scheduled index optimisation is not applied

Query timeouts

Delete by queries

Resolution for Date type mapping is ignored

Scrolling on large results won’t work yet

MoreLikeThis queries

Mixing Lucene based indexes and Elasticsearch based indexes

Any aspect related to performance and efficiency will also be looked at only at the end of basic feature development.

API Changes

In the 5.x series we will keep backward compatibility.

That might come at a cost of not perfect Hibernate Search / Elasticsearch integration API wise.
This is something we will address in the 6.x series. But our focus is on offering the right set of features and get feedback in 5.x before improving the APIs.

In a nutshell, 6.x will depend on how you use this feature in 5.6.

How to get this release

Downloads from Sourceforge are available as well, but these don’t contain the Elasticsearch integration components yet.
Similarly the WildFly modules also are not including the new Elasticsearch extensions yet.