Cluster Hosts and Role Assignments

This topic describes suggested role assignments for a CDH cluster managed by Cloudera Manager. The actual assignments you choose for your deployment
can vary depending on the types and volume of work loads, the services deployed in your cluster, hardware resources, configuration, and other factors.

When you install CDH using the Cloudera Manager installation wizard, Cloudera Manager attempts to spread the roles among cluster hosts (except for roles assigned to Edge hosts) based on
the resources available in the hosts. You can change these assignments on the Customize Role Assignments page that appears in the wizard. You can also change and add
roles at a later time using Cloudera Manager. See Role Instances.

CDH Cluster Hosts and Role Assignments

Master hosts run Hadoop master processes such as the HDFS NameNode and YARN Resource Manager.

Utility hosts run other cluster processes that are not master processes such as Cloudera Manager and the Hive Metastore.

Edge hosts are client access points for launching jobs in the cluster. The number of Edge hosts required varies depending on the type and size of the
workloads.

Worker hosts primarily run DataNodes and other distributed processes such as Impalad.

Important: Cloudera recommends that you always enable high availability when CDH is used in a production environment.

Cluster Hosts and Role Assignments

Cluster Size

Master Hosts

Utility Hosts

Edge Hosts

Worker Hosts

Very Small, without High Availability

Up to 10 worker hosts

High availability not enabled

Master Host 1:

NameNode

YARN ResourceManager

JobHistory Server

ZooKeeper

Impala StateStore

Kudu master

One host for all Utility and Edge roles:

Secondary NameNode

Cloudera Manager

Cloudera Manager Management Service

Hive Metastore

HiveServer2

Impala Catalog

Hue

Oozie

Flume

Gateway configuration

3 - 10 Worker Hosts:

DataNode

NodeManager

Impalad

Kudu tablet server

Small, with High Availability

Up to 20 worker hosts

High availability enabled

Master Host 1:

NameNode

JournalNode

FailoverController

YARN ResourceManager

ZooKeeper

JobHistory Server

Kudu master

Master Host 2:

NameNode

JournalNode

FailoverController

YARN ResourceManager

ZooKeeper

Impala StateStore

Kudu master

Master Host 3:

Kudu master (Kudu requires an odd number of masters for HA.)

Utility Host 1:

Cloudera Manager

Cloudera Manager Management Service

Hive Metastore

Impala Catalog

Oozie

ZooKeeper (requires dedicated disk)

JournalNode (requires dedicated disk)

One or more Edge Hosts:

Hue

HiveServer2

Flume

Gateway configuration

3 - 20 Worker Hosts:

DataNode

NodeManager

Impalad

Kudu tablet server

Medium, with High Availability

Up to 200 worker hosts

High availability enabled

Master Host 1:

NameNode

JournalNode

FailoverController

YARN ResourceManager

ZooKeeper

Kudu master

Master Host 2:

NameNode

JournalNode

FailoverController

YARN ResourceManager

ZooKeeper

Kudu master

Master Host 3:

ZooKeeper

JournalNode

JobHistory Server

Impala StateStore

Kudu master

Less than 80 hosts managed by Cloudera Manager

Utility Host 1:

Cloudera Manager

Utility Host 2:

Cloudera Manager Management Service

Hive Metastore

Catalog Server

Oozie

One or more Edge Hosts:

Hue

HiveServer2

Flume

Gateway configuration

50 - 200 Worker nodes:

DataNode

NodeManager

Impalad

Kudu tablet server

Greater than 80 hosts managed by Cloudera Manager

Utility Host 1:

Cloudera Manager

Utility Host 2:

Hive Metastore

Catalog Server

Oozie

Utility Host 3:

Activity Monitor

Utility Host 4:

Host Monitor

Utility Host 5:

Navigator Audit Server

Utility Host 6:

Navigator Metadata Server

Utility Host 7:

Reports Manager

Utility Host 8:

Service Monitor

Large, with High Availability

Up to 500 worker hosts

High availability enabled

Master Host 1:

NameNode

JournalNode

FailoverController

ZooKeeper

Kudu master

Master Host 2:

NameNode

JournalNode

FailoverController

ZooKeeper

Kudu master

Master Host 3:

YARN ResourceManager

ZooKeeper

JournalNode

Kudu master

Master Host 4:

YARN ResourceManager

ZooKeeper

JournalNode

Master Host 5:

JobHistory Server

Impala StateStore

ZooKeeper

JournalNode

We recommend no more than three Kudu masters.

Utility Host 1:

Cloudera Manager

Utility Host 2:

Hive Metastore

Catalog Server

Oozie

Utility Host 3:

Activity Monitor

Utility Host 4:

Host Monitor

Utility Host 5:

Navigator Audit Server

Utility Host 6:

Navigator Metadata Server

Utility Host 7:

Reports Manager

Utility Host 8:

Service Monitor

One or more Edge Hosts:

Hue

HiveServer2

Flume

Gateway configuration

200 - 500 Worker Hosts:

DataNode

NodeManager

Impalad

Kudu tablet server

Extra Large, with High Availability

Up to 1000 worker hosts

High availability enabled

Master Host 1:

NameNode

JournalNode

FailoverController

ZooKeeper

Kudu master

Master Host 2:

NameNode

JournalNode

FailoverController

ZooKeeper

Kudu master

Master Host 3:

YARN ResourceManager

ZooKeeper

JournalNode

Kudu master

Master Host 4:

YARN ResourceManager

ZooKeeper

JournalNode

Master Host 5:

JobHistory Server

Impala StateStore

ZooKeeper

JournalNode

We recommend no more than three Kudu masters.

Utility Host 1:

Cloudera Manager

Utility Host 2:

Hive Metastore

Catalog Server

Oozie

Utility Host 3:

Activity Monitor

Utility Host 4:

Host Monitor

Utility Host 5:

Navigator Audit Server

Utility Host 6:

Navigator Metadata Server

Utility Host 7:

Reports Manager

Utility Host 8:

Service Monitor

One or more Edge Hosts:

Hue

HiveServer2

Flume

Gateway configuration

500 - 1000 Worker Hosts:

DataNode

NodeManager

Impalad

Kudu tablet server

Allocating Hosts for Key Trustee Server and Key Trustee KMS

If you are enabling data-at-rest encryption for a CDH cluster, Cloudera recommends that you isolate the Key Trustee Server from other enterprise data hub (EDH) services by deploying the
Key Trustee Server on dedicated hosts in a separate cluster managed by Cloudera Manager. Cloudera also recommends deploying Key Trustee KMS on dedicated hosts in the same cluster as the EDH services
that require access to Key Trustee Server. This architecture enables multiple clusters to share the same Key Trustee Server and avoids having to restart the Key Trustee Server when restarting a
cluster.

For production environments in general, or if you have enabled high availability for HDFS and are using data-at-rest encryption, Cloudera recommends that you enable high availability for
Key Trustee Server and Key Trustee KMS.