Introduction

The integration of Sentry and HDFS permissions automatically keeps HDFS ACLs in sync with the privileges configured with Sentry. This feature offers the easiest way to share data between
Hive, Impala and other components such as MapReduce, Spark, and Pig, while setting permissions for that data with just one set of rules through Sentry. It maintains the ability of Hive and Impala to
set permissions on views, in addition to tables, while access to data outside of Hive and Impala (for example, reading files off HDFS) requires table permissions. HDFS permissions for some or all of
the files that are part of tables defined in the Hive Metastore will now be controlled by Sentry.

This consists of three components:

An HDFS NameNode plugin

A Sentry-Hive Metastore plugin

A Sentry Service plugin

With synchronization enabled, Sentry will translate permissions on databases and tables to the appropriate corresponding HDFS ACL on the underlying files in HDFS. For example, if a user
group is assigned to a Sentry role that has SELECT permissions on a particular table, that user group will also have read access to the HDFS files that are part of that table. When you list those
files in HDFS, this permission will be listed as an HDFS ACL. Or if a user group is assigned to a Sentry role that has SELECT permissions on a database, that user group will also have read access to
the HDFS files that are part of that database. When you list those files in HDFS, those permissions will also be listed as an HDFS ACL.

Note that when Sentry was enabled, the hive user/group was
given ownership of all files/directories in the Hive warehouse (/user/hive/warehouse). Hence, the resulting synchronized Sentry permissions will reflect this fact. If
you skipped that step, Sentry permissions will be based on the existing Hive warehouse ACLs. Sentry will not automatically grant ownership to the hive user.

The mapping of Sentry privileges to HDFS ACLs is as follows:

SELECT privilege -> Read access on the file.

INSERT privilege -> Write access on the file.

ALL privilege -> Read and Write access on the file.

Note that you must explicitly specify the path prefix to the Hive warehouse (default: user/hive/warehouse) and any other directories that must be managed
by Sentry. This procedure is described in the Enabling the HDFS-Sentry Plugin section below.

Important:

When you install Sentry, Sentry performs a full Hive metastore snapshot. This causes Hive metastore canary test failures while it synchronizes. Once the snapshot is complete, the
canary test will stabilize.

With synchronization enabled, your ability to set HDFS permissions for those files is disabled. Permissions for those particular files can be set only through Sentry, and when examined
through HDFS these permissions appear as HDFS ACLs. A configurable set of users, such as hive and impala, will have full access to the
files automatically. This ensures that a key requirement of using Sentry with Hive and Impala — giving these processes full access to regulate permissions on underlying data files — is met
automatically.

Tables and databases that are not associated with Sentry, that is, have no user with Sentry privileges to access them, will retain their old ACLs.

Synchronized privileges are not persisted to HDFS. This means that when this feature is disabled, HDFS privileges will return to their original values.

Setting HDFS ACLs on Sentry-managed paths will not affect the original HDFS ACLs. That is, if you set an ACL for a Hive object that also falls under the Sentry-managed path prefixes,
no action will be taken. If the path does not point to a Hive object managed by Sentry, HDFS ACLs will be set as expected.

Removing HDFC ACLs from paths will work the same way. If you attempt to remove an ACL associated with a Hive object managed by Sentry, no action will be taken. In all other cases, the
ACL will be removed as is expected behavior.

With HDFS-Sentry sync enabled, if the NameNode plugin is unable to communicate with the Sentry Service, affected HDFS files will continue to use a cached copy of the synchronized ACLs
for a configurable period of time, after which they will fall back to the Hive System User and the Hive System Group (for example, hive:hive). The timeout value can be modified by adding the
sentry.authorization-provider.cache-stale-threshold.ms parameter to the hdfs-site.xml Safety Valve in Cloudera Manager. The default
timeout value is 60 seconds, but you can increase this value from several minutes to a few hours, as needed to accommodate large clusters.

Column-level access control for access from Spark SQL is not supported by the HDFS-Sentry plug-in.

This documentation refers to the Hive System User and Hive System Group as the hive user or hive:hive. The default Hive
System User and Hive System Group are hive:hive. However, the user and group can be changed. You can verify the user and group in Cloudera Manager. Open the Hive
service, click the Configuration tab, and search for the System User and System Group properties. These
properties define the Hive user and group.

Prompting HDFS ACL Changes

URIs do not have an impact on the HDFS-Sentry plugin. Therefore, you cannot manage all of your HDFS ACLs with the HDFS-Sentry plugin and you must continue to use standard HDFS ACLs for
data outside of Hive.

HDFS ACL changes are triggered on:

Hive DATABASE object LOCATION (HDFS) when a role is granted to the object

Hive TABLE object LOCATION (HDFS) when a role is granted to the object

HDFS ACL changes are not triggered by:

Hive URI LOCATION (HDFS) when a role is granted to a URI

Hive SERVER object when a role is granted to the object. HDFS ACLs are not updated if a role is assigned to the SERVER. The privileges are inherited by child objects in standard Sentry
interactions, but the plugin does not trickle the privileges down.

Permissions granted on views. Views are not synchronized as objects in the HDFS file system.

Enabling the HDFS-Sentry Plugin

Locate the Sentry Synchronization Path Prefixes property or search for it by typing its name in the Search box.

Edit the Sentry Synchronization Path Prefixes property to list HDFS path prefixes where Sentry permissions should be enforced. Multiple HDFS path
prefixes can be specified. By default, this property points to /user/hive/warehouse and must always be non-empty. If you are using a non-default location for the Hive
warehouse, make sure you add it to the list of path prefixes. HDFS privilege synchronization will not occur for tables and databases located outside the HDFS regions listed here.
Important: Sentry will only manage paths that store Hive objects. If a path is listed under the Sentry
Synchronization Path Prefixes, but there is no Hive object there, Sentry will not manage permissions for that path.

Click Save Changes.

Restart the cluster. Note that it may take an additional two minutes after cluster restart for privilege synchronization to take effect.

Testing the Sentry Synchronization Plugins

The following tasks will help you ensure that Sentry-HDFS synchronization has been enabled and configured correctly:

For a folder that has been enabled for the plugin, such as the Hive warehouse, try accessing the files in that folder outside Hive and Impala. For this, you should know what tables and
databases those HDFS files belong to and the Sentry permissions on those tables. Attempt to view or modify the Sentry permissions settings over those tables using one of the following tools:

(Recommended) Hue's Security application

HiveServer2 CLI

Impala CLI

Access the tables and databases directly in HDFS. For example:

List files inside the folder and verify that the file permissions shown in HDFS (including ACLs) match what was configured in Sentry.

Run a MapReduce, Pig or Spark job that accesses those files. Pick any tool besides HiveServer2 and Impala

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.