Centralized metrics with Stackdriver Prometheus Exporter

Jan 6, 2019 •
Cristina Negrean •
1263 words
7 minute read •

Background

Do you package your web applications in Docker container images and run those in Google Kubernetes Engine cluster? Then most likely your application depends on at least a few GCP services. As you might then already know, Google does not currently charge for monitoring data when it comes to GCP metrics. Therefore, if you want to monitor operations of Google Cloud Platform (GCP) services, such as: Compute Engine, Cloud SQL, Cloud Dataflow and Cloud Pub/Sub, using Google’s Cloud Monitoring service - Stackdriver - is a perfect choice.

Above mentioned free allotment does not apply to your application custom metrics. Besides if you write applications in Java, on top of Spring Boot, you might as well have heard of the vendor-neutral application metrics facade Micrometer. Micrometer provides built-in support for the top listed full-metrics solution for Kubernetes: Prometheus. Prometheus happens to be as well open-source, community driven, and while Google claims that Stackdriver won’t lock developers into using a particular cloud provider, a read through their Stackdriver Kubernetes Monitoring documentation page currently states Beta release and limited to GKE only and something in the trend of add-on and opt-in.

An other interesting fact about Google’s approach to monitoring, is that they just recently open-sourced OpenCensus, a new set of libraries for collecting metrics and distributed traces, which provides a way to expose Java application services metrics in Prometheus data format. So one could wire OpenCensus into their Java application to expose an HTTP endpoint metrics for Prometheus to scrape, similarly to Spring Boot’s Actuator Prometheus endpoint use-case. That being said, Stackdriver is still unable to handle metrics in Prometheus data-format and you would need to use Stackdriver’s Prometheus integration - currently limited to 1,000 Prometheus-exported metric descriptions per GCP project - in order to centralize all metrics from your GCP services and Kubernetes resources, including your containerized Java applications, residing inside the Kubernetes pods.

At Trifork, as part of the weareblox project, we’ve made various GCP services metrics available to our Prometheus monitoring setup, and I thought it might be interesting to share the steps taken.

Prerequisites: This blog post assumes that you already have access to the Google Cloud Platform.
If you don’t, there is a GCP Free Tier which gives you free resources to learn about Google Cloud Platform (GCP) services. It’s possible to create a sandbox GCP project and enable Google Cloud Monitoring, by logging in to Stackdriver and adding a workplace for your GCP project.
When you create a GCP project from scratch, you need:

Tooling: Google Cloud SDK and Helm must be installed on your machine. Before installing the server portion of Helm, tiller, make sure you’re authenticated and connected to the right Kubernetes cluster. To authenticate to a Kubernetes cluster, you can use command: gcloud container clusters get-credentials <your-cluster-name>. To connect to the right cluster, you can issue command: kubectl config use-context <your-cluster-name>. Also the Prometheus Operator for Kubernetes should already be running in your Kubernetes cluster. I recommend Prometheus Operator - Quick install how-to guide from Sysdig, when you are just starting with Prometheus Operator for Kubernetes.

1. Decide which GCP metric types you want to export

The extensive list of what Stackdriver currently supports is documented here. In the context of the weareblox project, the list looks like below:

# The comma-separated list of prefixes to gather metrics forSTACKDRIVER_EXPORTER_MONITORING_METRICS_TYPE_PREFIXES:"dataflow.googleapis.com/job,pubsub.googleapis.com/subscription,pubsub.googleapis.com/topic,pubsub.googleapis.com/snapshot,compute.googleapis.com/instance/uptime,compute.googleapis.com/instance/cpu,compute.googleapis.com/instance/network,cloudsql.googleapis.com/database"

The API calls to Stackdriver are currently charged at $0.01/1,000 API calls, first 1 million API calls free. You will be able to monitor billing costs for it in Prometheus UI.

2. Install Helm chart

For getting the GCP metrics listed above into Prometheus time-series monitoring datastore, we need a few things:

When installing the stackdriver-exporter Helm chart, environment variables values used to configure the Kubernetes Pod can be overwritten at command line, as well it’s required to set the GCP project ID:

In the example above, the current state of stackdriver_dataflow_job_dataflow_googleapis_com_job_system_lag metric corresponds to the 5 minutes ago state of Stackdriver metric job/system_lag. The rule expression is checking whether the alert condition is being met. System lag is the current maximum duration that an item of data has been awaiting processing, in seconds.
The offset you need to set is also the value configured in the Helm chart installation for parameter stackdriver.metrics.interval.

To sum it up, no matter the tooling or approach you choose to centralize your metrics monitoring, there will always be pro’s and con’s, so proper evaluation of what your use-case requires is important.

As an example, I like Grafana most when it comes to graphing capabilities. It also integrates out-of-the box with both Prometheus and Stackdriver metrics data sources.

Prometheus however supports templating in the annotations and labels of alerts, giving it an edge when it comes to writing alerting rules.

Background

Do you package your web applications in Docker container images and run those in Google Kubernetes Engine cluster? Then most likely your application depends on at least a few GCP services. As you might then already know, Google does not currently charge for monitoring data when it comes to GCP metrics. Therefore, if you want to monitor operations of Google Cloud Platform (GCP) services, such as: Compute Engine, Cloud SQL, Cloud Dataflow and Cloud Pub/Sub, using Google’s Cloud Monitoring service - Stackdriver - is a perfect choice.

Above mentioned free allotment does not apply to your application custom metrics. Besides if you write applications in Java, on top of Spring Boot, you might as well have heard of the vendor-neutral application metrics facade Micrometer. Micrometer provides built-in support for the top listed full-metrics solution for Kubernetes: Prometheus. Prometheus happens to be as well open-source, community driven, and while Google claims that Stackdriver won’t lock developers into using a particular cloud provider, a read through their Stackdriver Kubernetes Monitoring documentation page currently states Beta release and limited to GKE only and something in the trend of add-on and opt-in.

An other interesting fact about Google’s approach to monitoring, is that they just recently open-sourced OpenCensus, a new set of libraries for collecting metrics and distributed traces, which provides a way to expose Java application services metrics in Prometheus data format. So one could wire OpenCensus into their Java application to expose an HTTP endpoint metrics for Prometheus to scrape, similarly to Spring Boot’s Actuator Prometheus endpoint use-case. That being said, Stackdriver is still unable to handle metrics in Prometheus data-format and you would need to use Stackdriver’s Prometheus integration - currently limited to 1,000 Prometheus-exported metric descriptions per GCP project - in order to centralize all metrics from your GCP services and Kubernetes resources, including your containerized Java applications, residing inside the Kubernetes pods.

At Trifork, as part of the weareblox project, we’ve made various GCP services metrics available to our Prometheus monitoring setup, and I thought it might be interesting to share the steps taken.

Prerequisites: This blog post assumes that you already have access to the Google Cloud Platform.
If you don’t, there is a GCP Free Tier which gives you free resources to learn about Google Cloud Platform (GCP) services. It’s possible to create a sandbox GCP project and enable Google Cloud Monitoring, by logging in to Stackdriver and adding a workplace for your GCP project.
When you create a GCP project from scratch, you need:

Tooling: Google Cloud SDK and Helm must be installed on your machine. Before installing the server portion of Helm, tiller, make sure you’re authenticated and connected to the right Kubernetes cluster. To authenticate to a Kubernetes cluster, you can use command: gcloud container clusters get-credentials <your-cluster-name>. To connect to the right cluster, you can issue command: kubectl config use-context <your-cluster-name>. Also the Prometheus Operator for Kubernetes should already be running in your Kubernetes cluster. I recommend Prometheus Operator - Quick install how-to guide from Sysdig, when you are just starting with Prometheus Operator for Kubernetes.

1. Decide which GCP metric types you want to export

The extensive list of what Stackdriver currently supports is documented here. In the context of the weareblox project, the list looks like below:

# The comma-separated list of prefixes to gather metrics forSTACKDRIVER_EXPORTER_MONITORING_METRICS_TYPE_PREFIXES:"dataflow.googleapis.com/job,pubsub.googleapis.com/subscription,pubsub.googleapis.com/topic,pubsub.googleapis.com/snapshot,compute.googleapis.com/instance/uptime,compute.googleapis.com/instance/cpu,compute.googleapis.com/instance/network,cloudsql.googleapis.com/database"

The API calls to Stackdriver are currently charged at $0.01/1,000 API calls, first 1 million API calls free. You will be able to monitor billing costs for it in Prometheus UI.

2. Install Helm chart

For getting the GCP metrics listed above into Prometheus time-series monitoring datastore, we need a few things:

When installing the stackdriver-exporter Helm chart, environment variables values used to configure the Kubernetes Pod can be overwritten at command line, as well it’s required to set the GCP project ID:

In the example above, the current state of stackdriver_dataflow_job_dataflow_googleapis_com_job_system_lag metric corresponds to the 5 minutes ago state of Stackdriver metric job/system_lag. The rule expression is checking whether the alert condition is being met. System lag is the current maximum duration that an item of data has been awaiting processing, in seconds.
The offset you need to set is also the value configured in the Helm chart installation for parameter stackdriver.metrics.interval.

To sum it up, no matter the tooling or approach you choose to centralize your metrics monitoring, there will always be pro’s and con’s, so proper evaluation of what your use-case requires is important.

As an example, I like Grafana most when it comes to graphing capabilities. It also integrates out-of-the box with both Prometheus and Stackdriver metrics data sources.

Prometheus however supports templating in the annotations and labels of alerts, giving it an edge when it comes to writing alerting rules.