Elasticsearch for Apache Hadoop 2.0 RC1 released

As the label implies, this release brings the current development iteration close to fruition. Since the last release, exactly one month ago, several improvements have been made:

Index time/date-based formatting

If you are dealing with time-based data (such as logs), es-hadoop can dynamically determine and format the target index/type based on the data being processed, entry by entry. This works transparently across libraries (Map/Reduce, Cascading, Hive, Pig) or, if opted so, on the raw JSON:

es.resource="my-collection/{timestamp:YYYY.MM.dd}"

Support for update scripting

The update and upsertfunctionality has been extended to allow the use of scripts and to mirror the Elasticsearch update API. Furthermore, the script parameters can be extracted dynamically, at runtime, from the data being processed. As you would expect, this works consistently raw JSON and Hadoop libraries.

Upgrade to the latest Apache Hive and Pig

We are actively monitoring the releases in the Hadoop ecosystem. es-hadoop 2.0 RC1 is not just compatible with the latest Apache Pig (0.12.1) and Apache Hive (0.13.0), it also supports the newly introduced types (like char) while preserving backwards compatibility.

Increase the version from 1.3 to 2.0

While reviewing the list of changes since the transformation of Elasticsearch for Apache Hadoop started, we quickly realized the version needs to reflect the plethora of new features and functionality added. And thus 1.3 became 2.0.

In addition to all these updates, es-hadoop has been extensively tested across various Hadoop distributions to ensure full compatibility; whatever your environment, we want to make sure es-hadoop works flawlessly.
Besides bug-fixes, the new release contains improved error and logging messages (especially when it comes to connectivity and network issues) to speed up the recovery process.