How To Configure Authentication for Amazon S3

There are several ways to integrate Amazon S3 storage with Cloudera clusters, depending on your use case and other factors, including whether the cluster has been deployed
using Amazon EC2 instances and if those instances were deployed using an IAM role, such as might be the case for clusters that have a single-user or small-team with comparable privileges. Clusters
deployed to support many different users with various privilege levels to the Amazon S3 need to use AWS Credentials and have privileges to target data set up in Sentry. See How to Configure AWS Credentials for details.

Authentication through the S3 Connector Service

Starting with CDH/Cloudera Manager 5.10, integration with Amazon S3 from Cloudera clusters has been simplified. Specifically, the S3 Connector Service
automates the authentication process to Amazon S3 for Impala, Hive, and Hue, the components used for business-analytical use cases designed to run on persistent multi-tenant clusters.

The S3 Connector Service transparently and securely distributes AWS credentials needed by the cluster for the Amazon S3 storage. Access to the underlying Impala tables is controlled by
Sentry role-based permissions. The S3 Connector Service runs on a secure cluster only, that is, a cluster configured to use:

Kerberos for authentication, and

Sentry for role-based authorization.

Note: Of the items listed in the screenshot below, only the Sentry service and Kerberos enabled messages are actual requirements. The other
messages are for informational purposes only.

In Cloudera Manager 5.11, the S3 Connector Service setup wizard is launched automatically during the AWS Credential setup process when you select the path to add the S3 Connector
Service.

Authentication through Advanced Configuration Snippets

Before release 5.10 and the introduction of the S3 Connector Service, using Amazon S3 storage with the cluster involved adding the credentials to the core-site.xml configuration file (through Cloudera Manager's Advanced Configuration Snippet mechanism). This approach is not recommended. AWS credentials provide read and write
access to data stored on Amazon S3, so they should be kept secure at all times.

Never share the credentials with other cluster users or services.

Do not store in cleartext in any configuration files. When possible, use Hadoop's credential provider to encrypt and store credentials in the local JCEK (Java Cryptography Extension
Keystore).

Important: Cloudera recommends using this approach for single-user clusters on secure
networks only—networks that allow access only to authorized users, all of whom are also authorized to use the S3 credentials.

To enable CDH services to access Amazon S3, AWS credentials can be specified using the fs.s3a.access.key and fs.s3a.secret.key properties:

The process of adding AWS credentials is generally the same as that detailed in configuring server-side
encryption for Amazon S3, that is, using the Cloudera Manager Admin Console to add the properties and values to the core-site.xml configuration file (Advanced
Configuration Snippet). However, Cloudera strongly discourages this approach: in general, adding AWS credentials to the core-site.xml is not recommended. A somewhat
more secure approach is to use temporary credentials, which include a session token that limits the viability of the credentials
to a shorter time-frame within which a key can be stolen and used.

Important: Cloudera recommends using this approach only for single-user clusters on secure
networks—networks that allow access only to authorized users, all of whom are also authorized to use the S3 credentials.

To connect to Amazon S3 using temporary credentials obtained from STS, submit them as command-line arguments with the Hadoop job. For example:

Creating a Table in a Bucket

To create a table in a bucket, a user must have Sentry permissions on the S3 database and bucket URI. Cloudera recommends that you create a database specifically for the purpose of
creating tables. Then, grant the user's role ALL permissions on the database and the URI.

It is possible to give the user ALL permissions on the server to allow the user to create external tables, but that approach is not recommended because it is not secure.

To allow a user to create tables in a bucket, complete the following steps:

Create a database where you want to create the tables.

Grant ALL on the database to the user's role.

Grant ALL on the bucket URI to the user's role.

The user must create the table with a reference to the database. For example:

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.