The jira will be resolved when core hadoop components (hdfs, mapreduce) are updated to use the new framework . Updates to external components that use the existing metrics framework will be tracked by different issues.

The patch is a combined (commons/mapreduce/hdfs) patch against yahoo-hadoop-0.20.104 (published at http://github.com/yahoo/hadoop-common/). The code should be quite stable (based on ongoing scale testing.)

The trunk patch will include significant API improvement with annotations (not included in the above patch). The design notes have been updated to reflect the current code: rev2

Note, the patch completely removes the old metrics framework, which is appropriate for our use cases (and forced porting of everything). The main question for the community is: do we want to provide a mechanism to have the two framework coexists in 0.22, where the metrics framework can be switched at runtime or let's move everything to v2 in 0.22?

The answer to the question would alter the trunk patch significantly.

I prefer the latter as the former (dual framework switchable at runtime per process) would cause more confusion to users (and more throwaway work for me.). As an incentive for the latter, I offer to port ganglia 3.1+ plugin for the new framework, even though we don't use it

Luke Lu
added a comment - 21/Aug/10 02:06 The patch is a combined (commons/mapreduce/hdfs) patch against yahoo-hadoop-0.20.104 (published at http://github.com/yahoo/hadoop-common/ ). The code should be quite stable (based on ongoing scale testing.)
The trunk patch will include significant API improvement with annotations (not included in the above patch). The design notes have been updated to reflect the current code: rev2
Note, the patch completely removes the old metrics framework, which is appropriate for our use cases (and forced porting of everything). The main question for the community is: do we want to provide a mechanism to have the two framework coexists in 0.22, where the metrics framework can be switched at runtime or let's move everything to v2 in 0.22?
The answer to the question would alter the trunk patch significantly.
I prefer the latter as the former (dual framework switchable at runtime per process) would cause more confusion to users (and more throwaway work for me.). As an incentive for the latter, I offer to port ganglia 3.1+ plugin for the new framework, even though we don't use it

With this change, would it be possible to send job related metrics through this code path? I.e. Stuff that implements org.apache.hadoop.mapred.Reporter interface. How about provide an example of REST style API for query when plugin is registered through MetricsSource code path?

Eric Yang
added a comment - 17/Sep/10 23:38 With this change, would it be possible to send job related metrics through this code path? I.e. Stuff that implements org.apache.hadoop.mapred.Reporter interface. How about provide an example of REST style API for query when plugin is registered through MetricsSource code path?

The framework is designed mostly to collect and dispatch per process metrics to monitor overall status (aggregates) of the system. While it's certainly possible to send per job metrics through this framework (via clever use of record and metrics/tagging naming conventions), the o.a.h.mapred.Reporter/Counters stuff is not really the target use case, as the latter support nested counter groups more naturally.

As for the API to query metrics from MetricsSource, any generic JMX client should work, as the metrics are just exported as attributes of an MBean (which corresponds to a MetricsSource.) E.g., jmx4perl package have a jmx to http agent that can act as an http gateway if that's what you want.

Luke Lu
added a comment - 18/Sep/10 00:26 @Eric Yang:
The framework is designed mostly to collect and dispatch per process metrics to monitor overall status (aggregates) of the system. While it's certainly possible to send per job metrics through this framework (via clever use of record and metrics/tagging naming conventions), the o.a.h.mapred.Reporter/Counters stuff is not really the target use case, as the latter support nested counter groups more naturally.
As for the API to query metrics from MetricsSource, any generic JMX client should work, as the metrics are just exported as attributes of an MBean (which corresponds to a MetricsSource.) E.g., jmx4perl package have a jmx to http agent that can act as an http gateway if that's what you want.

The design is treating jmx and ganglia as first class citizen. Why not web service REST api? For code structure perspective, this is cleaner but it breaks existing API without real value to the customer who already use jmx or MetricsContext api. Is there possibility to have REST interface without having to go through another bridge? I was thinking if the interface could be generalized, the new metrics framework can enable ability to create metrics REST api on name node, jobtracker, datanode, and task tracker in a consistent manner.

Eric Yang
added a comment - 18/Sep/10 02:18 The design is treating jmx and ganglia as first class citizen. Why not web service REST api? For code structure perspective, this is cleaner but it breaks existing API without real value to the customer who already use jmx or MetricsContext api. Is there possibility to have REST interface without having to go through another bridge? I was thinking if the interface could be generalized, the new metrics framework can enable ability to create metrics REST api on name node, jobtracker, datanode, and task tracker in a consistent manner.

Because custom, proprietary interfaces are evil and are a calling card of NIH?

I'd rather change the things we are looking at by changing a few parameters in our JMX clients than deal with Yet Another REST Interface. Screw backward compatibility if it means sticking with something standard.

Allen Wittenauer
added a comment - 18/Sep/10 03:57 >Why not web service REST api?
Because custom, proprietary interfaces are evil and are a calling card of NIH?
I'd rather change the things we are looking at by changing a few parameters in our JMX clients than deal with Yet Another REST Interface. Screw backward compatibility if it means sticking with something standard.

No, the design treats MetricsSource, MetricsSystem and MetricsSink interface as first class citizens. The default metrics system implementation happens to export all metrics source as JMX MBeans. A GangliaSink would be just another (though bundled for historical reasons) MetricsSink plugin just as FileSink and other Y internal plugins.

it breaks existing API without real value to the customer who already use jmx or MetricsContext api.

Currently only RPC and a portion of HDFS metrics are accessible via JMX. The new framework enabled all metrics (including jvm (from JvmMetrics that includes log4j category counts, not just system jvm stuff), mapred, ugi etc.) to be accessible via jmx with reduced effort. The main value of the new framework is to allow parallel nonblocking MetricsSinks and dynamic reconfig/reload metrics plugins without server restart. Both (requirements from Y ops) are impossible with the current AbstractMetricsContext approach. Y ops already enjoy rerouting metrics to different backends dynamically without restarting namenode (which takes a long time.) The whole metrics sub system with substantial production config options can be restarted in a fraction of a second.

Why not web service REST api?

It's entirely possible to write a MetricsSink plugin that actually does that. But it would be redundant as JMX is already available (as a pull model service) and there're plenty of JMX to whatever tools available.

I was thinking if the interface could be generalized...

The interface is almost as generalized as it can be. It's basically an efficient key-value pair pubsub system. All the servers export metrics in a consistent manner, albeit not via a web service which would be a very inefficient way of doing metrics for thousands of datanodes and tasktrackers. Instead, we can have highly efficient MetricsSink plugins to send compact UDP packets to hierarchical metrics aggregators.

Luke Lu
added a comment - 18/Sep/10 06:19 The design is treating jmx and ganglia as first class citizen.
No, the design treats MetricsSource, MetricsSystem and MetricsSink interface as first class citizens. The default metrics system implementation happens to export all metrics source as JMX MBeans. A GangliaSink would be just another (though bundled for historical reasons) MetricsSink plugin just as FileSink and other Y internal plugins.
it breaks existing API without real value to the customer who already use jmx or MetricsContext api.
Currently only RPC and a portion of HDFS metrics are accessible via JMX. The new framework enabled all metrics (including jvm (from JvmMetrics that includes log4j category counts, not just system jvm stuff), mapred, ugi etc.) to be accessible via jmx with reduced effort. The main value of the new framework is to allow parallel nonblocking MetricsSinks and dynamic reconfig/reload metrics plugins without server restart. Both (requirements from Y ops) are impossible with the current AbstractMetricsContext approach. Y ops already enjoy rerouting metrics to different backends dynamically without restarting namenode (which takes a long time.) The whole metrics sub system with substantial production config options can be restarted in a fraction of a second.
Why not web service REST api?
It's entirely possible to write a MetricsSink plugin that actually does that. But it would be redundant as JMX is already available (as a pull model service) and there're plenty of JMX to whatever tools available.
I was thinking if the interface could be generalized...
The interface is almost as generalized as it can be. It's basically an efficient key-value pair pubsub system. All the servers export metrics in a consistent manner, albeit not via a web service which would be a very inefficient way of doing metrics for thousands of datanodes and tasktrackers. Instead, we can have highly efficient MetricsSink plugins to send compact UDP packets to hierarchical metrics aggregators.

Eric Yang
added a comment - 02/Oct/10 01:42 The code looks ok. Is it is standard to do package version in the package name for Hadoop?
I prefer to have the code in org.apache.hadoop.metrics to avoid package version in the package name.
Otherwise +1 on the patch.

Thanks for the +1 the main reason we decided to use metrics2 is that we anticipated alternative evolution paths preferred by the community (see: http://goo.gl/Rjb1 and http://goo.gl/NLLs). It looks like like old and new metrics packages are going to coexist for a while, i.e. taking the #2 path:

Port all hadoop core metrics (common, hdfs and mapreduce) to new framework.

Deprecating the old metrics package so that external package (e.g. HBase etc.) can still function (in the old way)

BTW, there is already implicit versioning in hadoop as well, e.g., mapred vs mapreduce package, which I think is more confusing to (newer) people as it's not immediately clear which one is the new version.

IMO, it's quite reasonable to have some package versioning scheme, when there is a coexisting period for old and new packages that are not compatible.

Luke Lu
added a comment - 04/Oct/10 19:09 Hi Eric,
Thanks for the +1 the main reason we decided to use metrics2 is that we anticipated alternative evolution paths preferred by the community (see:
http://goo.gl/Rjb1 and http://goo.gl/NLLs ). It looks like like old and new metrics packages are going to coexist for a while, i.e. taking the #2 path:
Port all hadoop core metrics (common, hdfs and mapreduce) to new framework.
Deprecating the old metrics package so that external package (e.g. HBase etc.) can still function (in the old way)
BTW, there is already implicit versioning in hadoop as well, e.g., mapred vs mapreduce package, which I think is more confusing to (newer) people as it's not immediately clear which one is the new version.
IMO, it's quite reasonable to have some package versioning scheme, when there is a coexisting period for old and new packages that are not compatible.

Luke Lu
added a comment - 13/May/11 23:34 metrics2.impl.TestSinkQueue is failing, see build #453, build #447, etc.
I knew the test is racy (relying on Thread.yield) but it never failed on me until the pre-commit builds for HADOOP-7289 patches. Opened HADOOP-7292 to track the test fix.

Harsh J
added a comment - 22/Sep/12 07:39 Hi Luke,
What is this parent JIRA currently waiting on, post Metrics2? Can we resolve it? If not, let us re-state what it is waiting on so work can continue? Is it just the deprecation of Metrics1 that awaits?

The Metrics2 is very excited. And it's plugable. I think it would be widely used in the Hadoop system.

Now the Graphite is also a widely used tool like Ganglia. Graphite is a simpler solution for monitoring.
Do you have any interesting about a Graphite plugin. I have lots of experience about Graphite. Can I help you?

Kay Yan
added a comment - 06/Dec/12 15:54 Hi Luke
The Metrics2 is very excited. And it's plugable. I think it would be widely used in the Hadoop system.
Now the Graphite is also a widely used tool like Ganglia. Graphite is a simpler solution for monitoring.
Do you have any interesting about a Graphite plugin. I have lots of experience about Graphite. Can I help you?
+1 on the patch.

Majority of the Metrics2 work is already in releases out currently. The JIRA, I think, remains unresolved for some of the incomplete minor subtasks. You can use Metrics2 today in the release 2.0.2, for example.

Harsh J
added a comment - 06/Dec/12 16:01 Hi Kay,
Majority of the Metrics2 work is already in releases out currently. The JIRA, I think, remains unresolved for some of the incomplete minor subtasks. You can use Metrics2 today in the release 2.0.2, for example.