Create an AWSCredentialsProvider for EMRFS

In some cases, you may want to allow users to create an Amazon EMR cluster that accesses
and
analyzes data saved in Amazon S3, but the data is restricted in a way that makes
access using
EMRFS difficult. This difficulty arises because the credentials provided by Amazon
EMR
through the EC2 Instance Profile are not enough to allow access to the data. To
provide
credentials, you can define a custom credentials provider class, which implements
both
the AWSCredentialsProvider and the Hadoop Configurable classes. This custom credentials provider is more restrictive
than expanding access by modifying Amazon S3 bucket policies or other IAM policies.
Creating a custom credentials provider can help ensure that only those EMR clusters
configured to use it have access to the data in Amazon S3.

For a detailed explanation of this solution, see Securely Analyze Data from Another AWS Account with EMRFS in the AWS Big
Data blog. The blog post includes a tutorial that walks you through the process
end-to-end, from creating IAM roles to launching the cluster. It also provides
a Java
code sample that implements the custom credential provider class.

Customize the emrfs-site classification to specify the class that you
implement in the JAR file. For more information about specifying configuration
objects to customize applications, see Configuring Applications
in the Amazon EMR Release Guide.

The following example demonstrates a create-cluster command that launches a Hive cluster with common configuration parameters, and also
includes:

A bootstrap action that runs the script, copy_jar_file.sh, which is saved to mybucket in Amazon S3.

An emrfs-site classification that specifies a custom credentials provider defined in the JAR file
as MyCustomCredentialsProvider

Note

Linux line continuation characters (\) are included for readability. They can be removed
or used in Linux commands. For Windows, remove them or replace with a caret (^).