Integrating Hadoop Security with Active Directory

Considerations when using an Active Directory KDC

Performance:

As your cluster grows, so will the volume of Authentication Service (AS) and Ticket Granting Service (TGS) interaction between the services on each cluster server. Consider evaluating
the volume of this interaction against the Active Directory domain controllers you have configured for the cluster before rolling this feature out to a production environment. If cluster performance
suffers, over time it might become necessary to dedicate a set of AD domain controllers to larger deployments.

Network Proximity:

By default, Kerberos uses UDP for client/server communication. Often, AD services are in a different network than project application services such as Hadoop. If the domain controllers
supporting a cluster for Kerberos are not in the same subnet, or they're separated by a firewall, consider using the udp_preference_limit = 1 setting in the
[libdefaults] section of the krb5.conf used by cluster services. Cloudera strongly recommends against using AD
domain controller (KDC) servers that are separated from the cluster by a WAN connection, as latency in this service will significantly impact cluster performance.

Process:

Troubleshooting the cluster's operations, especially for Kerberos-enabled services, will need to include AD administration resources. Evaluate your organizational processes for engaging
the AD administration team, and how to escalate in case a cluster outage occurs due to issues with Kerberos authentication against AD services. In some situations it might be necessary to enable Kerberos event logging to address desktop and KDC issues within windows environments.

Also note that if you decommission any Cloudera Manager roles or nodes, the related AD accounts will need to be deleted manually. This is required because Cloudera Manager will not
delete existing entries in Active Directory.

Important: With CDH 5.1 and higher, clusters managed by Cloudera Manager 5.1 (and higher) do not require a local MIT KDC and are able to
integrate directly with an Active Directory KDC. Cloudera recommends you use a direct-to-AD setup. For instructions, see Enabling Kerberos Authentication Using the Wizard.

If direct integration with AD is not currently possible, use the following instructions to configure a local MIT KDC to trust your AD server:

Run an MIT Kerberos KDC and realm local to the cluster and create all service principals in this realm.

where the <enc_type_list> parameter specifies the types of encryption this cross-realm krbtgt principal will support: either AES, DES, or RC4 encryption. You can specify multiple
encryption types using the parameter in the command above, what's important is that at least one of the encryption types corresponds to the encryption type found in the tickets granted by the KDC in
the remote realm. For example:

Note: The cross-realm krbtgt principal that you add in this step must have at least one entry that uses the same encryption
type as the tickets that are issued by the remote KDC. If no entries have the same encryption type, then the problem you will see is that authenticating as a principal in the local realm will allow
you to successfully run Hadoop commands, but authenticating as a principal in the remote realm will not allow you to run Hadoop commands.

On All of the Cluster Hosts

Verify that both Kerberos realms are configured on all of the cluster hosts. Note that the default realm and the domain realm should remain set as the MIT Kerberos realm which is local
to the cluster.

To properly translate principal names from the Active Directory realm into local names within Hadoop, you must configure the hadoop.security.auth_to_local setting in the core-site.xml file on all of the cluster machines. The following example translates all principal names
with the realm AD-REALM.CORP.FOO.COM into the first component of the principal name only. It also preserves the standard translation for the default realm (the cluster
realm).

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.