Elasticsearch for Apache Hadoop 5.5.0

​I am excited to announce the release of Elasticsearch for Apache Hadoop (aka ES-Hadoop) 5.5.0 built against Elasticsearch 5.5.0.

What's new?

Support for Kerberos in Elasticsearch HDFS Repository Plugin

Back when 5.0 was released, the HDFS Repository was moved out of the ES-Hadoop project and into the Elasticsearch project. At this time, the plugin was re-written to work under Elasticsearch's security model. Support for 'Kerberized' HDFS clusters was dropped due to the overwhelming work required to have the client gel with the installed security policy. With 5.5.0, that's all a thing of the past, as we are announcing that Kerberos is back to being supported for the HDFS Repository Plugin! This has also been backported 5.4 branch as of 5.4.1.

New Delimiter for Index Formatting

When specifying index or type names using the dynamic multi-resource feature, users can specify an optional date format when specifying a date field to be extracted. In certain cases, the colon character (:) was causing problems with parsing the resource template as a path when using a date format in the index name. To address this issue, in 5.5 and up we will accept both the colon character (:) and the pipe character (|) as delimiters for date formats. The colon character is considered deprecated in 5.5 and will be eventually removed in ES-Hadoop 6.0.

TTL and Timestamps

Support for TTL and Timestamp is being removed in ES 6.0. In ES-Hadoop 5.5 we will log warnings when TTL and Timestamp mapping configurations are specified in the configuration at job start. In ES-Hadoop 6.0, those warnings will become runtime errors when executing against an Elasticsearch version at or above version 6.0. For the sake of backwards compatibility, these settings will remain in the project in order to support their use in Elasticsearch 5.x and below.

Deprecations

Hadoop 1.x

Support for Hadoop 1.x in the Elasticsearch-Hadoop connector will be deprecated in 5.5, and completely removed in 6.0. This means that Elasticsearch-Hadoop 6.0 and above will no longer work against Hadoop 1.x (pre-YARN) based distributions in version 6.0. In striving to maintain the strongest integrations for the most popular Hadoop ecosystem components, we’re moving forward with only supporting Hadoop versions 2.2 and above, which vendors have been bundling in their distributions for years. Most vendors no longer support Hadoop 1.x, and projects like Spark will no longer depend on Hadoop 1.x libraries going forward. Users on a Hadoop 1.x based distribution are recommended to upgrade to a Hadoop 2.2 or higher based distribution for continued bug fixes and enhancements to ES-Hadoop.

Elasticsearch on YARN Beta

The Elasticsearch on YARN project (Documentation, Github) will be deprecated in 5.5, and completely removed in 6.0. Elasticsearch on YARN was an experiment for deploying Elasticsearch on top of YARN, the resource management platform introduced in Hadoop 2.0. The project was never recommended for production use and has been in beta status since its inception. The core limitation is YARN’s lack of official support for long-running services, and there has unfortunately been no prioritized innovation in the Hadoop community on this front over the last few years. As this is a requirement for Elasticsearch to achieve production level stability on YARN, we are instead focusing our efforts in enhancing other areas of ES-Hadoop.

For any questions about the above deprecations, feel free to join the discussion on our forums.