Use E-MapReduce to collect metrics from a Kafka client

This section describes how to use E-MapReduce to collect metrics from a Kafka client
to conduct effective performance monitoring.

Background

Kafka provides a collection of metrics that are used to measure the performance of
Broker, Consumer, Producer, Stream, and Connect. E-MapReduce collects metrics for
Kafka Broker by using Ganglia to monitor the running status of this Kafka Broker.
A Kafka system consists of two roles: a Kafka Broker and multiple Kafka clients. When
an issue of read/write performance occurs, you must perform an analysis on the both
Kafka Broker and clients. Metrics from Kafka clients are important for performing
the analysis.

Principle

Collect Metrics for Kafka performance

Kafka supports multiple external Metrics Reporters. JMX Reporter is built in to Kafka
by default. You can use the JMX tool to view metrics of Kafka. You can implement your
own Metrics Reporter such as org.apache.kafka.common.metrics.MetricsReporter to collect custom metrics.

Store Metrics

You can customize Kafka metrics. In addition, you need a data store to keep these
metrics for later use and analysis. You can store metrics to Kafka without using a
third-party data store as Kafka itself is a data store. In addition, Kafka can be
easily integrated with other services. You can collect metrics from a client as the
following figure shows:

Prerequisites

Restrictions

Support for only Java applications;

Support for only clients of Kafka 0.10 or later;

Without compiling code by yourself, E-MapReduce has published the jar package in Maven.
You can download the latest version from the download link.

In this section, we use E-MapReduce to automatically create a Kafka cluster. For more
information, see Create a cluster.

The network type of this Kafka cluster is VPC in the China (Hangzhou) region. The
master instance group is configured with a public IP and an internal network IP. The
following figure shows the details.