Prometheus 2.0 finds an interesting way to wow the crowd

It’s been almost one year since Prometheus 1.0 was released. Prometheus 2.0 has found a different way to wow the crowd — it has a new storage layer meant to dramatically increases monitoring scalability for Kubernetes and other distributed systems.

Prometheus 1.0 promised to deliver a stable API and user interface and solve a lot of problems. According to Björn Rabenstein, engineer at SoundCloud and Prometheus core developer, “the one common problem that is usually part of the mix has been nailed by Jamie Wilkinson in Google’s Site Reliability Engineering book (O’Reilly 2016): ‘We need monitoring systems that allow us to alert for high-level service objectives, but retain the granularity to inspect individual components as needed’.”

Prometheus 2.0 has a different ace up its sleeve— a new storage layer meant to dramatically increases monitoring scalability for Kubernetes and other distributed systems, according to the blog post announcing the release.

Prometheus 2.0: The problem solver

Time series databases have their share of challenges. Fabian Reinartz, engineer/tech lead at CoreOS, explained in the blog post that “a time series system collects data points over time, linked together into a series. Each data point is a numeric value associated with a timestamp. A series is defined by a metric name and labeled dimension and serves to partition raw metrics into fine-grained measurements.”

For example, they can measure the total number of received requests, divided by request path, method, and server instance in the following series:

This is where Prometheus comes into play: it allows them to query those series by their metric name and by specific labels. Querying for requests_total{path=”/status”} will return the first two series in the above example. The number of data points they want to receive for those series can be constrained to an arbitrary time range, for example the last two hours, or the last 30 seconds.

Any storage layer supporting a collection of metrics series at the rate and volumes seen in cloud environments will face two key design challenges:

Vertical and horizontal

Let’s think of this as a two-dimensional plane where the vertical dimension represents all of the stored series, while the horizontal dimension represents time through which samples are spread, Reinartz explained.

Challenge no.1: Vertical and horizontal

Prometheus periodically collects new data points for all series, therefore it must perform vertical writes at the right end of the time axis. However, when querying, they might want to access rectangles of arbitrary area anywhere across the plane.

As a result, time series read access patterns are different from write patterns so any useful storage layer for such a system must deliver high performance for both cases.

Series churn

According to Reinartz, “series are associated with deployment units, such as a Kubernetes pod. When a pod starts, new series are added into our data pool. If a pod shuts down, its series stops receiving new samples, but the existing data for the pod remains available. A new series begins for the pod automatically spun up to replace the terminated pod. Auto-scaling and rolling updates for continuous deployment cause this instance churn to happen orders of magnitude more often than in conventional environments.”

Even though Prometheus may usually collect data points for a roughly fixed number of active series, the total number of series in the database grows linearly over time. Therefore, this is how the series plane in dynamic environments like Kubernetes clusters usually looks like:

Challenge no.2: Series churn

An index is the answer to the problem of finding queried series efficiently but the index which works well for five million series may not be suitable when dealing with 200+ million.

New storage layer design to the rescue

Reinartz wrote that “the Prometheus 1.x storage layer deals well with the vertical write pattern” but revealed that “environments imposing more and more series churn started to expose a few shortcomings in the index.”

As a result, the team is addressing earlier design decisions that might lead to reduced predictability and inefficient resource consumption in such massively dynamic environments. The aim of the new storage layer is, therefore, to address these shortcomings to make it even easier to run Prometheus in environments like Kubernetes, as well as to prepare Prometheus for the workloads of the future.

Sample compression

The sample compression feature of the existing storage layer was an important part of Prometheus’s early success. Reinartz explained that when Prometheus collects a few hundred thousand data points (one rawdata point occupies 16 bytes of storage) per second, this can quickly fill a hard drive.

However, since samples within the same series tend to be very similar, they can exploit it to apply efficient compression to samples. “Batch compressing chunks of many samples of a series, in memory, squeezes each data point down to an average 1.37 bytes of storage,” he wrote.

Since the compression scheme works well, they decided to keep it in the design of the new version 2 storage layer.

Time sharding

The team realized that rectangular query patterns are served well by a rectangular storage layout. The new storage layer divides storage into blocks, each of which holds all the series from a range of time. Furthermore, each block acts as a standalone database.

As a result, a query can examine only the subset of blocks within the requested time range, and also addresses the problem of series churn. If the database only considers a fraction of the total plane, query execution time naturally decreases, according to Reinartz.

Another benefit of this layout is that it makes it easy to delete old data (an expensive procedure in the old storage layer). Once a block’s time range completely falls behind the configured retention boundary, it can be dropped entirely.

The Index

Although one cannot deny that reducing the queried data set is very efficient, it makes improving the overall index increasingly crucial as we expect series churn behavior to only intensify,Reinartz explained.

Series should be queried by their metric names and labels — they are completely arbitrary, user-configured, and vary by application and by use case. A column index like those in common SQL databases cannot be used so the new Prometheus storage layer borrows the inverted index concept used in full-text search engines. Its benefit is that it can efficiently retrieve documents by matching any of the words inside of them. The new storage layer treats each series descriptor as a tiny document. The series name and each label pair are then words in that document.

Gabriela Motroc is editor of JAXenter.com and JAX Magazine. Before working at Software & Support Media Group, she studied International Communication Management at the Hague University of Applied Sciences.