Announcing Apache Ranger 0.5.0

As YARN drives Hadoop’s emergence as a business-critical data platform, the enterprise requires more stringent data security capabilities. The Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides a platform for centralized security policy administration across the core enterprise security requirements of authorization, audit and data protection.

On June 10th, the community announced the release of Apache Ranger 0.5.0. With this release, the community took major steps to extend security coverage for Hadoop platform and deepen its existing security capabilities. Apache Ranger 0.5.0 addresses over 194 JIRA issues and delivers many new features, fixes and enhancements. Among these improvements, the following features are notable:

Centralized administration, authorization and auditing for Solr, Kafka and YARN

Apache Ranger key management store (KMS)

Hooks for dynamic policy conditions

Metadata protection in Hive

Support queries for audit data stored in HDFS using Solr

Optimization of auditing at source

Pluggable architecture for Apache Ranger (Ranger Stacks)

This blog provides an overview of the new features and how they integrate with other Hadoop services, as well as provides a preview of focus areas that the community has planned for upcoming releases.

Centralized Administration, Authorization and Auditing for Solr, Kafka and YARN

Administrators can now use Apache Ranger’s centralized platform to manage access policies for Solr (collection level), Kafka (topic level) and YARN (capacity schedule queues). The centralized authorization and auditing capability add into what was previously available for HDFS, HBase, Hive, Knox and Storm. As a precursor to this release, Hortonworks security team worked closely with the community to build authentication support (Kerberos) and authorization APIs in Apache Solr and Apache Kafka.

Administrators can now apply security policies to protect queues in Kafka and ensure authorized users are able to submit or consume from a Kafka topic. Similarly, Ranger can be used to control query access at Solr collection level, ensuring sensitive data in Apache Solr is secured in production environments. Apache Ranger’s integration with YARN RM enables administrators to control which applications can submit to a queue and prevent rogue applications from using YARN.

Apache Ranger Key Management Store (KMS)

In this release, HDP takes a major step forward in meeting enterprises’ requirements for security and compliance by introducing transparent data encryption for encrypting data for HDFS files, combined with a Ranger embedded open source Hadoop KMS. Ranger now provides security administrators the ability to manage keys and authorization policies for KMS.

This encryption feature in HDFS, combined with KMS access policies maintained by Ranger, prevents rogue Linux or Hadoop administrators from accessing data and supports segregation of duties for both data access and encryption. You can find more details on TDE through this blog.

Hooks for dynamic policy conditions

As enterprises’ Hadoop deployments mature, there is a need to move from static role- based access control to access-based on dynamic rules. An example, would be to provide access based on time of the day (9am to 5pm), or geo (access only if logged in from a particular location) or even data values.

In Apache Ranger 0.5.0, community took the first step to move towards a true ABAC (attribute based access control) model by introducing hooks to manage dynamic policies, thereby providing a framework for users to control access based on dynamic rules. Users can now specify their own conditions and rules (similar to a UDF) as part of service definitions, and these conditions can vary by service (HDFS, Hive etc). In the future, based on community feedback, Apache Ranger might include some of the conditions out of the box.

The following commands related to Hive metadata will now provide relevant information only based on user privileges.

Show Databases

Show Tables

Describe table

Show Columns

Support Queries for Audit Data Using Solr

Currently, Apache Ranger UI provides the ability to perform interactive queries against audit data stored in RDBMS. In this release, we are introducing support for storing and querying audit data in Solr. This functionality removes dependency on database for audit and provides users with visibility into Solr data using dashboards built on banana UI. We recommended that users enable audit writing for both Solr and HDFS, and purge data in Solr at regular intervals.

Optimization of Auditing at Source

Auditing all events or jobs in Hadoop generate high volume of audit data. Apache Ranger 0.5.0 provides the ability to summarize audit data at the source for given time period, by user, resource accessed and action, thereby reducing audit data volume and noise and impact on underlying storage for improved performance.

Pluggable Architecture for Apache Ranger (Ranger Stacks)

As part of this release, the Ranger community worked extensively to revamp the Apache Ranger architecture. As a result of this effort, Apache Ranger 0.5.0 now provides a pluggable architecture for policy administration and enforcement. Using a “single pane of glass,” end-users can configure and manage their security across all components of their Hadoop stack and extend it to their entire big data environment.

Apache Ranger 0.5.0 enables customers and partners to easily add a new “service” to support a new component or data engine. Based on JSON, this service is configurable.

Users can create custom service as plug-in to any data store, build and manage services centrally for their big data BI applications.

Preview of Features to Come

The Apache Ranger release would not have been possible without contributions from the dedicated community members who have done a great job understanding the needs of the user community and delivering them. Based on demand from the user community, we will continue to focus our efforts in three primary areas:

Global data classification, “tags,” based security policies

Expanding encryption support to HBase and Hive

Ease of installation and use, through better Apache Ambari integration

Your email address will not be published. Required fields are marked *

Comment

If you have specific technical questions, please post them in the Forums

Name*

Email*

Related Posts

BLOG

1.9.17

Hortonworks 2016 Year in Review

As we kick off the new year I wanted to thank our customers, partners, Apache community members, and of course the amazing Hortonworks team, for an amazing 2016. Let’s take a step back and look at some of the Hortonworks highlights from last year... IN THE ECOSYSTEM there was tremendous acceleration. At the beginning of…

The Power of your Data Achieved:...

It’s no secret that there is a data explosion. A recent IDC analyst report from April 2014 indicated the volume of data, known as the digital universe, is doubling in size every two years. And by 2020, there will be as many digital bits as there are stars in the universe. There are many reasons…

Jumpstart Your Digital Transformation with Hadoop...

Guest author: Jeff Kelly, Data Strategist, Pivotal The phrase “digital transformation” gets bandied about a lot these days, but what exactly does it mean? When you strip away the hyperbole, I believe digital transformation is the process by which enterprises evolve from using traditional information technology to merely support existing business models to adopting modern…

What’s the best cloud architecture—and how...

People often think about cloud architecture in simplistic terms: you’re either public, private, or hybrid. (In fact, there’s even confusion about the meaning of the term “hybrid” itself—this video helps clear it up: https://www.youtube.com/watch?v=HPKI-U_ef5w In the real world, of course, virtually every implementation is hybrid—no company puts 100% of its IT environment into one single…

The 100% open source and community driven innovation of Apache Hive 2.0 and LLAP (Long Last and Process) truly brings agile analytics to the next level. It enables customers to perform sub-second interactive queries without the need for additional SQL-based analytical tools, enabling rapid analytical iterations and providing significant time-to-value. TRY HIVE LLAP TODAY Read about…

If You Think Cloud, Think Connected...

Cloud Computing is one of the big three trends impacting IT architectures today. What some may not realize is that an underlying connected data architecture is not only essential for cloud, but sits at the confluence of all three trends. Here's why. The first big trend is IoT. According to BI Intelligence, we can now…

Insights Aggregation and Predictive Analytics within...

How Hortonworks can help hotel industry capture value through Insights Aggregation and Predictive Analytics Big Data has transformed every industry including the hospitality vertical. Through customer analytics, targeted segmentation, and campaigning, hotels would like to focus on delivering personalized promotions, cross and up-selling travel services. Our objective is to address these challenges through an open-source…

HDF Speed Test Contest

Show us what you can do! Here at Hortonworks, we’ve been showing people how fast and easy it is to use Hortonworks DataFlow, powered by Apache NiFi to easily, quickly and securely move data to where you need it. So we thought we’d test it out - and we are offering a speed test challenge!…

An introduction to Ambari Views 2.4...

Originally posted in HCC. Ambari Views Server is the Standalone Ambari Server used for hosting Views and Ambari Server is the Operational Ambari Server which manages a Hadoop Cluster Before Ambari 2.4, when Ambari Views Servers are setup, the only way to configure views was to use ‘Custom Configuration’. In this method details had to…