This release continues the drumbeat for security functionality in particular, with HDFS encryption (jointly developed with Intel under Project Rhino) now recommended for production use. This feature alone should justify upgrades for security-minded users (and an improved CDH upgrade wizard makes that process easier).

Security

Folder-level HDFS encryption (in addition to storage, management, and access to encryption zone keys) is now a production-ready feature (HDFS-6134). This feature integrates with Navigator Key Trustee so that encryption keys can be securely stored separately from the data, with all the enterprise access and audit controls required to pass most security compliance audits such as PCI.

The Cloudera Manager Agent can now be run as a single configured user when running as root is not permitted.

In Apache Sentry (incubating), data can now be shared across Impala, Apache Hive, Search, and other access methods such as MapReduce using only Sentry permissions.

A Sentry bug that affected CDH 5.2 upgrades has been patched (SENTRY-500).

Data Management and Governance

In Cloudera Navigator 2.2, policies are now generally available and enabled by default. Policies let you set, monitor and enforce data curation rules, retention guidelines, and access permissions. They also let you notify partner products, such as profiling and data preparation tools, whenever there are relevant changes to metadata.

Navigator 2.2’s REST API now supports user-defined relations. Using these new APIs, you can augment Navigator’s automatically-generated lineage with your own column-level lineage. This is particularly useful for custom MapReduce jobs that run on structured data sources.

Navigator 2.2 also features many top-requested enhancements, including metadata search auto-suggest and a number of other usability improvements.

Cloud Deployments

Cloudera Enterprise 5.3 is now a first-class citizen with respect to deployments on Microsoft Azure.

Apache Hadoop gets a new S3-native filesystem for improved performance on AWS (HADOOP-10400).

Real-Time Architecture

Apache Flume now includes an Apache Kafka Channel for tighter Kafka-Flume integration (FLUME-2500).

New or Updated Open Source Components

Apache Spark 1.2

Hue 3.7

Impala 2.1

Other notables: Oracle JDK 1.8 is now supported, Impala now does incremental computation of table and column statistics (IMPALA-1122), and Apache Avro has new date, time, timestamp, and duration binary types (AVRO-739).

Over the next few weeks, we’ll publish blog posts that cover some of these features in detail. In the meantime: