Elasticsearch for Apache Hadoop 5.0.0-alpha2 and 2.3.1 released

I am pleased to announce the release of Elasticsearch for Apache Hadoop (aka ES-Hadoop) 5.0.0-alpha2 and 2.3.1

IMPORTANT: As mentioned in the version, 5.0.0 is alpha software that will only work with Elasticsearch 5.0.0-alpha2. Please test it, but do not use it in production.

Now that we have that out of the way, let’s see what these releases bring to the table.

What’s new?

Version alignment

ES-Hadoop 5.0.0-alpha2 joins the 5.0 Elastic release train and add support for Elasticsearch 5.0.0-alpha1 and alpha2 while maintaining support for Elasticsearch 1.x and 2.x.
Attentive users might notice that there was no ES-Hadoop 5.0.0-alpha1 release and this was on purpose; to minimize confusion between products compatibility, ES-Hadoop is aligning itself with the global versioning of the Elastic stack.

Removed support for ‘old’ library versions

In ES-Hadoop 5.0.0-alpha2, the requirements for the various libraries have been raised to clean-up the code base and remove cruft.
This includes (but is not limited to):

Eliminate integration for Spark SQL alpha 1.0-1.2

SparkSQL was released in Spark 1.0 through 1.2 as an alpha component which became stable in Spark 1.3. In doing so however the Spark SQL API has changed significantly (moving away from SchemaRDD to DataFrame). In ES-Hadoop 5.0.0-alpha2, support for Spark SQL 1.0-1.2 is being removed.
The core/RDD support for Spark 1.0 is still present however with the iminent release of Spark 2.0, it is likely the version requirement will be raised (probably to Spark 1.2 or 1.3).

Remove HDFS plugin repository

As the HDFS repository plugin is now part of Elasticsearch proper, it has been removed from the ES-Hadoop project. Users of Elasticsearch 2.x can still use it as part of ES-Hadoop 2.x.
Note that the HDFS plugin in Elasticsearch 5.x is not just conveniently packaged but also better integrated (there is no need to disable the JVM SecurityManager for example - an option that is anyway not available anymore).

Bump Hive compatibility to 1.0

Hive 1.0 has been released for quite a while and the majority of distros have already moved to it. As such, support for Hive 0.13 and Hive 0.14 (two releases that were plagued by snafus) has now been dropped cleaning up the code base.

Keep compatibility with JVM 1.7

Currently ES-Hadoop 5.0.0-alpha2 can still be used on JVM 1.7. This means users using old Hadoop distros or using Scala 2.10 can upgrade to ES-Hadoop without concern. Note that Elasticsearch 5.0 itself does require JDK 1.8 however as ES-Hadoop is a REST client, there are no hard JVM dependencies between the two - decoupling FTW!

What about 2.3.1?

ES-Hadoop 2.3.1 accompanies the 5.0.0-alpha2 release, introducing a few but important enhancements:

Rework field escaping

A bug report in the Spark module triggered a review and subsequent rework of the way internally mapping fields are being passed on. No API have changed however, at least in Spark, users should be now able to use field names with rare characters (such as %).

HDFS repository upgrade

The HDFS repository plugin has been upgraded to Elasticsearch 2.3.2.

Better error messages

Some of the error messages at start-up have been improved to provide more guidance especially for new users.