Kafka Setup

Hardware Requirements

Kafka can function on a fairly small amount of resources, especially with some configuration tuning. Out of the box configurations can run on little as 1 core and 1 GB memory with
storage scaled based on data retention requirements. These are the defaults for both broker and Mirror Maker in Cloudera Manager version 6.x.

Brokers

Kafka requires a fairly small amount of resources, especially with some configuration tuning. By default, Kafka, can run on as little as 1 core and 1GB memory with storage
scaled based on requirements for data retention.

CPU is rarely a bottleneck because Kafka is I/O heavy, but a moderately-sized CPU with enough threads is still important to handle concurrent connections and background
tasks.

Kafka brokers tend to have a similar hardware profile to HDFS data nodes. How you build them depends on what is important for your Kafka use cases. Use the following
guidelines:

To affect performance of these features:

Adjust these parameters:

Message Retention

Disk size

Client Throughput (Producer & Consumer)

Network capacity

Producer throughput

Disk I/O

Consumer throughput

Memory

A common choice for a Kafka node is as follows:

Component

Memory/Java Heap

CPU

Disk

Broker

RAM: 64 GB

Recommended Java heap: 4 GB

Set this value using the Java Heap Size of Broker Kafka configuration property.

ZooKeeper

It is common to run ZooKeeper on 3 broker nodes that are dedicated for Kafka. However, for optimal performance Cloudera recommends the usage of dedicated Zookeeper hosts. This is
especially true for larger, production environments.

Kafka Performance Considerations

The simplest recommendation for running Kafka with maximum performance is to have dedicated hosts for the Kafka brokers and a dedicated ZooKeeper cluster for the Kafka cluster. If that
is not an option, consider these additional guidelines for resource sharing with the Kafka cluster:

Do not run in VMs

It is common practice in modern data centers to run processes in virtual machines. This generally allows for better sharing of resources. Kafka is sufficiently sensitive to I/O
throughput that VMs interfere with the regular operation of brokers. For this reason, it is highly recommended to not use VMs for Kafka; if you are running Kafka in a virtual environment you will
need to rely on your VM vendor for help optimizing Kafka performance.

Do not run other processes with Brokers or ZooKeeper

Due to I/O contention with other processes, it is generally recommended to avoid running other such processes on the same hosts as Kafka brokers.

Keep the Kafka-ZooKeeper Connection Stable

Kafka relies heavily on having a stable ZooKeeper connection. Putting an unreliable network between Kafka and ZooKeeper will appear as if ZooKeeper is offline to Kafka. Examples of
unreliable networks include:

Do not put Kafka/ZooKeeper nodes on separated networks

Do not put Kafka/ZooKeeper nodes on the same network with other high network loads

Operating System Requirements

SUSE Linux Enterprise Server (SLES)

Unlike CentOS, SLES limits virtual memory by default. Changing this default requires adding the following entries to the /etc/security/limits.conf
file:

* hard as unlimited
* soft as unlimited

Kernel Limits

There are three settings you must configure properly for the kernel.

File Descriptors

You can set these in Cloudera Manager via Kafka > Configuration > Maximum Process File Descriptors. We recommend a configuration of 100000 or higher.

Max Memory Map

You must configure this in your specific kernel settings. We recommend a configuration of 32000 or higher.

Max Socket Buffer Size

Set the buffer size larger than any Kafka send buffers that you define.

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.