Impala Security

Impala includes a fine-grained authorization framework for Hadoop, based on Apache Sentry.
Sentry authorization was added in Impala 1.1.0. Together with the Kerberos
authentication framework, Sentry takes Hadoop security to a new level needed for the requirements of
highly regulated industries such as healthcare, financial services, and government. Impala also includes
an auditing capability which was added in Impala 1.1.1; Impala generates the audit data which can be
consumed, filtered, and visualized by cluster-management components focused on governance.

The Impala security features have several objectives. At the most basic level, security prevents
accidents or mistakes that could disrupt application processing, delete or corrupt data, or reveal data to
unauthorized users. More advanced security features and practices can harden the system against malicious
users trying to gain unauthorized access or perform other disallowed operations. The auditing feature
provides a way to confirm that no unauthorized access occurred, and detect whether any such attempts were
made. This is a critical set of features for production deployments in large organizations that handle
important or sensitive data. It sets the stage for multi-tenancy, where multiple applications run
concurrently and are prevented from interfering with each other.

The material in this section presumes that you are already familiar with administering secure Linux systems.
That is, you should know the general security practices for Linux and Hadoop, and their associated commands
and configuration files. For example, you should know how to create Linux users and groups, manage Linux
group membership, set Linux and HDFS file permissions and ownership, and designate the default permissions
and ownership for new files. You should be familiar with the configuration of the nodes in your Hadoop
cluster, and know how to apply configuration changes or run a set of commands across all the nodes.

The security features are divided into these broad categories:

authorization

Which users are allowed to access which resources, and what operations are they allowed to perform?
Impala relies on the open source Sentry project for authorization. By default (when authorization is not
enabled), Impala does all read and write operations with the privileges of the impala
user, which is suitable for a development/test environment but not for a secure production environment.
When authorization is enabled, Impala uses the OS user ID of the user who runs
impala-shell or other client program, and associates various privileges with each
user. See Enabling Sentry Authorization for Impala for details about setting up and managing
authorization.

authentication

How does Impala verify the identity of the user to confirm that they really are allowed to exercise the
privileges assigned to that user? Impala relies on the Kerberos subsystem for authentication. See
Enabling Kerberos Authentication for Impala for details about setting up and managing authentication.

auditing

What operations were attempted, and did they succeed or not? This feature provides a way to look back and
diagnose whether attempts were made to perform unauthorized operations. You use this information to track
down suspicious activity, and to see where changes are needed in authorization policies. The audit data
produced by this feature can be collected and presented in a user-friendly form by cluster-management
software. See Auditing Impala Operations for details about setting up and managing
auditing.