CDH

This is the documentation for CDH 4.7.1.
Documentation for other versions is available at Cloudera Documentation.

Step 7: Configure Secure HDFS

When following the instructions in this section to configure the properties in the hdfs-site.xml
file, keep the following important guidelines in mind:

The properties for each daemon (NameNode, Secondary NameNode, and DataNode) must
specify both the HDFS and HTTP principals, as well as the path to the HDFS keytab
file.

The Kerberos principals for the NameNode, Secondary NameNode, and DataNode are
configured in the hdfs-site.xml file. The same
hdfs-site.xml file with all three of these principals
must be installed on every host machine in the cluster. That is, it is not
sufficient to have the NameNode principal configured on the NameNode host machine
only. This is because, for example, the DataNode must know the principal name of the
NameNode in order to send heartbeats to it. Kerberos authentication is
bi-directional.

The special string _HOST in the properties is replaced at run-time
by the fully-qualified domain name of the host machine where the daemon is running.
This requires that reverse DNS is properly working on all the hosts configured this
way. You may use _HOST only as the entirety of the second component
of a principal name. For example, hdfs/_HOST@YOUR-REALM.COM is valid, but
hdfs._HOST@YOUR-REALM.COM and hdfs/_HOST.example.com@YOUR-REALM.COM are not.

When performing the _HOST substitution for the Kerberos principal
names, the NameNode determines its own hostname based on the configured value of
fs.default.name, whereas the DataNodes determine their
hostnames based on the result of reverse DNS resolution on the DataNode hosts.
Likewise, the JobTracker uses the configured value of
mapred.job.tracker to determine its hostname whereas the
TaskTrackers, like the DataNodes, use reverse DNS.

The dfs.datanode.address and
dfs.datanode.http.address port numbers for the DataNode
must be below 1024, because this provides part of the security mechanism
to make it impossible for a user to run a map task which impersonates a DataNode.
The port numbers for the NameNode and Secondary NameNode can be anything you want,
but the default port numbers are good ones to use.

To configure secure HDFS: Add the following properties to the hdfs-site.xml file on
every machine in the cluster. Replace these example values shown below with the correct settings for your
site: path to the HDFS keytab, YOUR-REALM.COM, fully qualified domain name of NN, and
fully qualified domain name of 2NN