Scraping application metrics with Prometheus

There’re two conceptually different approaches in collecting application metrics. There’s PUSH approach, when metrics storage sits somewhere and waits until metrics source pushes some data into it. For instance, Graphite doesn’t do any collection on its own, it waits until somebody like collectd does the delivery.

There’s second approach – PULL. In this approach metrics sources don’t try to be smart and just provide their readings on demand. Whoever needs those metrics can make a call, e.g. HTTP request, in order to get some.

Prometheus collects metrics using the second approach.

What’s Prometheus

Prometheus is an open source storage for time series of metrics, that, unlike Graphite, will be actively making HTTP calls to fetch new application metrics. Once the data is saved, you can query it using built in query language and render results into graphs. However, you’ll do yourself a favor by using Grafana for all the visuals. And I’m not being mean, this is exactly what they suggest.

Along with get/store/display features, Prometheus supports recording rules, that create new data feeds by processing existing ones, alerting rules, that will produce alerts data feed when those rules fire, and working in federations, that allows Prometheus to scale.

Installation

Prometheus exists as downloadable archive with binary, git repository with sources and Docker image. As usual, using Docker is the easiest way to get it running locally – docker run -p 9090:9090 prom/prometheus. Once Prometheus is started with default settings, its web UI will be listening at port 9090:

Jobs, targets and instances

Prometheus needs some targets to scrape application metrics from. Such individual target is called instance – an app or a process that is able to provide metrics data in a format that scraper can understand. What’s interesting, Prometheus can provide its own metrics and therefore can be the target for other scrapers, even for himself. In fact, this is exactly what’s happening by default. Just head to current targets list at http://localhost:9090/targets and you’ll see there’s one target already:

/metrics is conventional path for accessing target’s data and by default Prometheus appends it to any target URL it gets.

/prometheus/prometheus.yml, that holds targets configuration, introduces one more concept: jobs.

Short version of prometheus.yml

YAML

1

2

3

4

5

6

7

8

global:

scrape_interval: 15s

evaluation_interval: 15s

scrape_configs:

- job_name: 'prometheus'

static_configs:

- targets: ['localhost:9090']

Job is a collection of instances of the same type – e.g. feeds from identical replicated servers.

Prometheus data format

Let’s see what kind of data Prometheus deals with. In order to do that just head to url displayed at /targets page:

This is quite interesting: there’re some data rows that look pretty familiar, e.g. go_gc_duration_seconds_count 10. First component is obviously a metric name, and 10 is its value. However, there’s no timestamp, which implies Prometheus uses scrape time for that (but you still can provide your own).

But the most of data rows have some additional data in curly braces. Like here – go_gc_duration_seconds{quantile="1"}. Thing is Prometheus data can be multidimensional. Along with metric name and its value you can assign arbitrary key-value pairs to it. Those can be anything: host name, metric source ({cpu='1'}) metric type ({method='POST'}), etc.

What’s more, Prometheus silently adds few labels to metrics on its own: names of the instance and the job that metrics belongs to.

What’s even more, it also produces two synthetic series for every target it scrapes called up{...} 1|0 – whether or not target is online, and scrape_duration_seconds . That’s very convenient for monitoring whether or not targets are still alive.

Prometheus query language

When you have data with more than one field, you’ll probably need a query language for that. As it happens, Prometheus has pretty powerful one. Going into its details is way beyond the scope of this post, but to give you a feeling of how it looks, here’re some examples:

Using series name, e.g. go_gc_duration_seconds is already a query that returns all metrics from that series;

Providing series name with the label, e.g. collectd_cpu{cpu='0'} will only return values that have such label. Regular expressions will also work (=~).

Using functions and time ranges can do some crazy stuff. E.g. the following query will return max CPU-0 value for last minute: max_over_time(collectd_cpu{cpu='0'}[1m])

You can experiment with Prometheus queries directly in its web UI at localhost:9090/graph:

Scraping targets

So far we saw some monitoring data taken from the Prometheus itself. Are there any other sources of metrics data for it? Well, obviously.

Firstly, there’re some apps, like etcd or Kubernetes that provide their metrics in format which Prometheus understands. Etcd actually listens at /metrics address, so it works with Prometheus from out of the box.

Secondly, there’re exporters for many, many applications. Those behave like small web-servers that convert internal application metrics to proper format and server it at /metrics path. And the number of such exporters is quite impressive, starting from collectd, Apache and RabbitMQ exporters, and ending with somewhat exotic exporter for Edison development board.

Finally, sometimes it’s impossible for Prometheus to make a HTTP call to certain target. It might be hidden behind a firewall, or too fragile to accept the call. But if the target can make a call back, there’s pushgateway tool that can accept requests from other agents, accumulate the data it receives, and then listen at /metrics path to let Prometheus get its data back.

Creating graphs

Right where you run your data queries in Prometheus, there’s also a tab called ‘Graph’ that picks query results and tries its best to render it:

However, its ‘best’ is not quite enough. Grafana on the other hand supported Prometheus for quite a while, and it’s fairly easy to use them together.

Conclusion

Prometheus is metrics collector and storage that doesn’t wait until the data finds its way to him and makes a call first. Unlike rrdtool or Graphite, it operates with multidimensional data, which allows storing not only the measurement itself, but also some of its details. E.g. for RESTful service statistics it could include HTTP method, IP address, web method name, user agent, or even user id.

Prometheus query language, on the other hand, is powerful tool to make use of that data. Along with basic select and filter operations, it provides quite a number of functions and aggregation rules.

Graphing is probably the weakest side of Prometheus. However, seeing how easy it is to connect it to Grafana, that’s not really a problem.

4 thoughts on “Scraping application metrics with Prometheus”

This is my endpoint “http://10.9.64.47:9100/metrics”. When i try to wget 10.9.64.47:9100/metrics from prometheus server i can see the request and response with another server. But in prometheus dashboard under targets all servers are showing “down”. Moreover, when i try to use wget 10.9.64.47:9100/metricsxyg or otherthan /metrics it was working. /metrics is not working. Could you please give any suggestions what was the issue.

Hi Srikanth,
Sorry, I can’t build up a full picture from your description so I can only guess. If you can wget /metrics from the host where prometheus installed, but prometheus itself doesn’t get any data, do you have any http logs on receiving side (10.9.64.47) to check if prometheus call ever reaches it? Does the call URL look right to you? Does prometheus itself logs any warnings?