Pages

Prometheus is a monitoring, alerting and statistics collector tool [1]. It provides a multi-dimensional data model (time series identified by metric name and key/value pairs) and a query capabilities similar to Graphite. The collection happens via a pull model over HTTP which makes it a good fit for microservices environment. As long as the service exposes metrics over RESTful API, Prometheus can scrape them, store them, query them and alert on them. For graphing and visualisation Prometheus integrates with Grafana and the latter can be used to create dashboards etc.

In this post I'll deploy the Prometheus server, the alerting module called Alertmanager, the Node Collector module which exports various low level server stats, Grafana as a front-end and exim4 for sending email alerts.

Since Prometheus is a Go binary, let's install the dependencies, build the server binary and make a docker container to run the service in:

In order to monitor the general health of a node (cpu, memory, uptime, etc) Prometheus needs to contact an HTTP endpoint to collect the information for that node. One way to do this is by using a Node exporter [2] - a simple RESTful API that returns various server statistics.

Browse to port 3000, click on the Grafana logo, then click on "Data Sources" in the sidebar, "Add New", select "Prometheus" as the type. Set the appropriate Prometheus endpoint e.g. http://localhost:9090/

Now that we have a server and a custom service that we monitor and collect data for, let's configure alerting based on that data. I'll use the Alertmanager service [3] to send an email if the test_metric key returns anything but 1:

In order for Prometheus to send alerts to Alertmanager we need to create an alert rule, and add it to Prometheus config file, then restart Prometheus specifying the Alertmanager endpoint for the integration to happen. The config should look like the following:

With this Prometheus server is integrated with Alertmanager and ready to send alerts. To trigger an alert kill the netcat session to simulate a failure or change the returned value to something different than 1.

Aside from the Prometheus UI we can query for metrics directly using the API: