Sat 17 June 2017

You got Prometheus up and running and eager to start instrumenting your Django application.
Don't be hasty and read Prometheus Best Practices.

Let's say our application has to use a Python library to request weather forecast.
Sometimes Weather API server doesn't work as expected and we find HTTP 500 status codes in our application logs.
Out of curiosity we want to know how often that happens.
What if we already have to look for another forecast provider?

First idea could be using a counter "weatherapi_responses_500_total".
From the Prometheus docs:

When reporting failures, you should generally have some other metric
representing the total number of attempts.
This makes the failure ratio easy to calculate.

Alright, then we need to count "HTTP 200 OK" API responses as well "weatherapi_responses_200_total".
But what about other HTTP status codes? Shall we create metrics for each of them?

When you have multiple metrics that you want to add/average/sum,
they should usually be one metric with labels rather than multiple metrics.

As this is exactly our case, we should use Prometheus labels. Therefore our
metric name should be "weatherapi_responses_total" with a "code" label for the HTTP response code.

Depending on a scenario, we might have multiple API clients in our application.
For example, we may generalize metric to "api_responses_total" with
the following labels "code=200", "service=weather".

Although we should keep it sane and not overuse labels.

StatsD by Datadog

In the previous post
we used StatsD with statsd_exporter to forward metrics to Prometheus server.
StatsD protocol doesn't have a notion of labels,
but Datadog's fork has introduced tags.
Moreover statsd_exporter can convert them into Prometheus labels.