Search This Blog

Monitoring and measuring reactive application with Dropwizard Metrics

In the previous article we created a simple indexing code that hammers ElasticSearch with thousands of concurrent requests. The only way to monitor the performance of our system was an old-school logging statement:

It's fine, but on a production system, we'd rather have some centralized monitoring and charting solution for gathering various metrics. This becomes especially important once you have hundreds of different applications in thousands of instances. Having a single graphical dashboard, aggregating all important information, becomes crucial. We need two components in order to collect some metrics:

publishing metrics

collecting and visualizing them

Publishing metrics using Dropwizard Metrics

In Spring Boot 2 Dropwizard Metrics were replaced by Micrometer. This article uses the former, the next one will show the latter solution in practice. In order to take advantage of Dropwizard Metrics we must inject MetricRegistry or specific metrics into our business classes.

This helper method above increments the number of successes and failures every time request completes. Moreover, it logs and swallows errors so that a single error or timeout does not interrupt the whole import process.

Another method above increments the indexConcurrent metric when new request is sent and decrements it once result or error arrives. This metrics keeps going up and down, showing the number of in-flight requests.

The final helper method is the most complex. It measures the total time of indexing, i.e. the time between the request being sent and the response received. As a matter of fact, it's quite generic, it simply calculates the total time between a subscription to arbitrary Mono<T> and when it completes. Why does it look so weird? Well, the basic Timer API is very simple:

indexTimer.time(() -> someSlowCode())

It simply takes a lambda expression and measures how long did it took to invoke it. Alternatively you can create small Timer.Context object that remembers when it was created. When you call Context.stop() it reports this measurement:

With asynchronous streams it's much harder. Starting of a task (denoted by subscription) and completion typically happens across thread boundaries in different places in code. What we can do is create (lazily) a new Context object (see: fromCallable(indexTimer::time)) and when wrapped stream completes, complete the Context (see: input.doOnSuccess(x -> time.stop()). This is how you compose all these methods:

In the next article we will learn how to compose all these methods even better. And avoid some boilerplate.

Publishing and visualizing metrics

Collecting metrics on its own is not enough. We must publish aggregated metrics periodically so that other systems can consume, process and visualize them. One such tool is Graphite and Grafana. But before we dive into configuring them, let's first publish metrics to the console. I find this especially useful when troubleshooting metrics or during development.

Here I'm reporting to localhost:2003 where my Docker image with Graphite + Grafana happens to be. Once every second all metrics are sent to this address. We can later visualize all these metrics on Grafana:

The top diagram displays the indexing time distribution (from 50th to 99.9th percentile). Using this diagram you can quickly discover what is the typical performance (P50) as well as (almost) worst case performance (P99.9). The logarithmic scale is unusual but in this case allows us to see both low and high percentiles. The bottom diagram is even more interesting. It combines three metrics:

rate (requests per second) of successful index operations

rate of failed operations (red bar, stacked on top of the green one)

current concurrency level (right axis): number of in-flight request

This diagram shows the system throughput (RPS), failures and concurrency. Too many failures or unusually high concurrency level (many operations pending for response) might be a sign of some issues with your system. The dashboard definition is available in the GitHub repository.

In the next article, we will learn how to migrate from Dropwizard Metrics to Micrometer. A very pleasant experience!

You've found some very interesting edge case! BTW I had a follow-up article about Micrometer (http://www.nurkiewicz.com/2018/01/spring-boot-2-migrating-from-dropwizard.html), you might want to test it as well.