[ https://issues.apache.org/jira/browse/HADOOP-14972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Mackrory updated HADOOP-14972:
-----------------------------------
Description:
We'd like metrics to track latencies for various operations, such as latencies for various
request types, etc. This may need to be done different from current metrics types that are
just counters of type long, and it needs to be done intelligently as these measurements are
very numerous, and are primarily interesting due to the outliers that are unpredictably far
from normal. A few ideas on how we might implement something like this:
* An adaptive, sparse histogram type. I envision something configurable with a maximumum granularity
and a maximum number of bins. Initially, datapoints are tallied in bins with the maximum granularity.
As we reach the maximum number of bins, bins are merged in even / odd pairs. There's some
complexity here, especially to make it perform well and allow safe concurrency, but I like
the ability to configure reasonable limits and retain as much granularity as possible without
knowing the exact shape of the data beforehand.
* LongMetrics named "read_latency_600ms", "read_latency_800ms" to represent bins. This was
suggested to me by [~fabbri]. I initially did not like the idea of having either so many hard-coded
bins for however many op types, but this could also be done dynamically (we just hard-code
which measurements we take, and with what granularity to group them, e.g. read_latency, 200
ms). The resulting dataset could be sparse and dynamic to allow for extreme outliers, but
the granularity is still pre-determined.
* We could also simply track a certain number of the highest latencies, and basic descriptive
statistics like a running average, min / max, etc. Inherently more limited in what it can
show us, but much simpler and might still provide some insight when analyzing performance.
was:
We'd like metrics to track latencies for various operations, such as latencies for various
request types, etc. This may need to be done different from current metrics types that are
just counters of type long, and it needs to be done intelligently as these measurements are
very numerous, and are primarily interesting due to the outliers that are unpredictably far
from normal.
* An adaptive, sparse histogram type. I envision something configurable with a maximumum granularity
and a maximum number of bins. Initially, datapoints are tallied in bins with the maximum granularity.
As we reach the maximum number of bins, bins are merged in even / odd pairs. There's some
complexity here, especially to make it perform well and allow safe concurrency, but I like
the ability to configure reasonable limits and retain as much granularity as possible without
knowing the exact shape of the data beforehand.
* LongMetrics named "read_latency_600ms", "read_latency_800ms" to represent bins. This was
suggested to me by [~fabbri]. I initially did not like the idea of having either so many hard-coded
bins for however many op types, but this could also be done dynamically (we just hard-code
which measurements we take, and with what granularity to group them, e.g. read_latency, 200
ms). The resulting dataset could be sparse and dynamic to allow for extreme outliers, but
the granularity is still pre-determined.
* We could also simply track a certain number of the highest latencies, and basic descriptive
statistics like a running average, min / max, etc. Inherently more limited in what it can
show us, but much simpler and might still provide some insight when analyzing performance.
> Histogram metrics types for latency, etc.
> -----------------------------------------
>
> Key: HADOOP-14972
> URL: https://issues.apache.org/jira/browse/HADOOP-14972
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Reporter: Sean Mackrory
> Assignee: Sean Mackrory
>
> We'd like metrics to track latencies for various operations, such as latencies for various
request types, etc. This may need to be done different from current metrics types that are
just counters of type long, and it needs to be done intelligently as these measurements are
very numerous, and are primarily interesting due to the outliers that are unpredictably far
from normal. A few ideas on how we might implement something like this:
> * An adaptive, sparse histogram type. I envision something configurable with a maximumum
granularity and a maximum number of bins. Initially, datapoints are tallied in bins with the
maximum granularity. As we reach the maximum number of bins, bins are merged in even / odd
pairs. There's some complexity here, especially to make it perform well and allow safe concurrency,
but I like the ability to configure reasonable limits and retain as much granularity as possible
without knowing the exact shape of the data beforehand.
> * LongMetrics named "read_latency_600ms", "read_latency_800ms" to represent bins. This
was suggested to me by [~fabbri]. I initially did not like the idea of having either so many
hard-coded bins for however many op types, but this could also be done dynamically (we just
hard-code which measurements we take, and with what granularity to group them, e.g. read_latency,
200 ms). The resulting dataset could be sparse and dynamic to allow for extreme outliers,
but the granularity is still pre-determined.
> * We could also simply track a certain number of the highest latencies, and basic descriptive
statistics like a running average, min / max, etc. Inherently more limited in what it can
show us, but much simpler and might still provide some insight when analyzing performance.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org