Hadoop–Ganglia Integration using Hadoop Metrics2 Framework

In our previous post here, we detailed why Ganglia is a good tool for monitoring clusters. However, when monitoring a Hadoop cluster you often need more information about CPU, disk, memory, and nodal network statistics than the generic Ganglia config can provide. For those who need more finely tuned monitoring, Hadoop supports a framework for recording internal statistics and then for posting them to an external source, either to a file or to Ganglia. In fact, Hadoop now supports an implementation of the Metrics2 Framework for Ganglia. In this post we’ll discuss Hadoop Metrics2 Framework’s design and how it enables Ganglia metrics.

Features

The Hadoop Metrics2 Framework provisions multiple metrics output plugins for use in parallel. It allows dynamic reconfiguration of metrics plugins without having to restart the server, and it exports metrics via Java Management Extensions (JMX).

Design Overview

The Hadoop Metrics2 Framework consists of three major components:

1. The metric source is used to generate metrics.

2. The metric sink is used to consume the metrics produced by the metric sources.

3. The metric system is used to periodically poll metric sources and to pass the metric records to sink.

Implementing and Configuring Components

A metric source class must implement the following interface:

org.apache.hadoop.metrics2.MetricsSource

A metric sink must implement this interface:

org.apache.hadoop.metrics2.MetricsSink

The basic syntax to configure metric system components is:

&lt;prefix&gt;.(source|sink).&lt;instance&gt;.&lt;option&gt;

Here’s a sample job tracker configuration for sinking a file:

jobtracker.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink

jobtracker.sink.file.filename=jobtracker-metrics.out

Filtering Metrics

Metrics can be filtered on source, context, records, tags, and metrics themselves. Here is a filtering example :

Flux7 is the only Sherpa on the DevOps journey that assesses, designs, and teaches while implementing a holistic solution for its enterprise customers, thus giving its clients the skills needed to manage and expand on the technology moving forward. Not a reseller or an MSP, Flux7 recommendations are 100% focused on customer requirements and creating the most efficient infrastructure possible that automates operations, streamlines and enhances development, and supports specific business goals.