As of the upcoming Linkerd 1.2.0 release, we are deprecating the io.l5d.statsd telemeter.

We’ve been considering removing the statsd telemeter because it doesn’t work the way most people expect, which can cause big surprises in production. Specifically, the way that the statsd telemeter samples counter increments means that you can miss out on very important information in your runtime metrics (like failures). To avoid missing important events, the sample rate on the telemeter can be increased, but this will lead to increased latency.

the way that the statsd telemeter samples counter increments means that you can miss out on very important information in your runtime metrics (like failures)

Thanks for announcing this ahead of time. Could you please give me a bit more context on what this means? Specifically, why does this cause a loss of data?

We’re actually not ingesting metrics from linkerd into this pipeline right now, but we were planning to in the near future. Having linkerd’s metrics outside our “normal” metrics system is a bit problematic so we’re keen to unify them, but now I see this is being removed…

I’m happy to give a bit more context. The statsd telemeter is sampled which means that for each event (such as a counter increment) a message is pushed to statsd at some sample rate. Linkerd metrics contain high velocity stats (such as request count) which must be sampled with a very low sample rate to avoid an excessive number of network requests to statsd. At the same time, Linkerd metrics also contains very low velocity stats (such as individual failure type count) where any amount of sampling dramatically decreases the usefulness or may cause you to miss the event altogether.

Because of this, we recommend using the influxdb telemeter instead which can be adapted to a statsd backend using telegraf. I hope this helps!