Announcing our biggest release so far

November 17, 2016 by
Ivo Mägi
Filed under:
PlumbrProduct Updates

Last months have been busy in Plumbr. We have been polishing the next major release capable of monitoring applications built using microservices and deployed in large clusters. In parallel, several other significant changes to the product were introduced. As of today, we are happy to announce the general availability of the release. In the next sections I will walk you through the new features and changes.

New concepts in Plumbr

First and foremost – the key goal of Plumbr is still the same. We are all about monitoring the end user interactions with the application.

Such interactions are linked with the application/service endpoint used and user identity performing the interaction.

In addition, when the interactions end up being too slow or failing altogether, Plumbr exposes the root cause for the poor user experience.

However, there are two new concepts and lots of smaller changes helping you to get even more out of our monitoring solution.

Supporting distributed transactions.

When your application architecture includes multiple JVMs servicing the same transactions, Plumbr is now capable of linking the transaction spanning across all the JVMs involved in processing.

To illustrate the concept, let us take a look at the following example where user is interacting with the front end via http://billing.my.com interface, unaware that in the backend there is a total of five JVM servicing his request:

With the new Plumbr, you no longer need to manually track the transaction as it flows through the different nodes. All nodes participating in the transaction are linked to the transaction as child spans. In addition, the references to the user initiating the transaction, service consumed and root causes detected will be linked to the parent transaction directly.

You experience when analyzing an individual user interaction consuming multiple microservices in the back-end would match the situation in the following screenshot:

The solution is currently limited to HTTP-based integrations, for which Plumbr injects a custom HTTP header into the outbound request. This header includes a transactionID used to link individual spans together. This means that when the call between JVMs is carried out via other protocols (RMI, JMS, etc), Plumbr will not be able to build the link just yet.

Introducing Applications

So far, Plumbr has been all about monitoring individual JVM. As the deployments often involve whole clusters of JVMs servicing inbound traffic for the same application, we have introduced a new concept called — surprise — the Application.

Application is defined as an entry point for the transaction. All transactions arriving to the same entry point belong to the same application. For web applications, this would mean that all the transactions arriving to the Plumbr-monitored deployment via the same URL would belong to the same application.

The change means that the information from complex deployments is exposed in a unified manner. This is best explained by looking at the following example:

The inbound requests, arriving into any nodes in the front-end cluster are all identified as transactions belonging to the same application (http://billing.my.com in the example). The identification is done based on the domain the requests are arriving to. All transactions arriving to the same domain will be bound to the same application

In a sense, Application will largely replace the concept of JVM you were used to so far. Better yet, it is both more powerful and more convenient to work with.

Changes & improvements

User interface

General concept is still the same. Plumbr user interface allows you to see the interactions your end users performed either via the familiar Service, User and Root Cause or the two new concepts, Application and Job.

However, when exploring the new UI, you will notice that Plumbr is now allowing you to get more information in every list of items. It is now possible to add secondary dimensions to all lists. This allows you to get answers to questions like “Show me which services this particular user has accessed last week” or “show me which users all these lock contention issues are impacting”.

When investigating a single entity, the experience of a particular user, behavior of specific service / application or impact of a root cause can be visualized in different ways – either via transactions/users in time, percentiles or latency distribution:

The single transaction view is also somewhat different, including the spans from multiple nodes participating in transaction:

Dashboard.

The dashboard is also different from what you have gotten used to. The dashboard is now configurable per user. You can now pick the widgets to be displayed on the dashboard and configure the it the way you like. In addition, the widgets can be configured to display data only from a particular application.

These two changes mean that different teams being responsible for different applications can now create the dashboard containing only relevant information for the particular team.

Decoupling Jobs from Services.

Four months ago, Plumbr started monitoring scheduled jobs running inside the JVMs. The concept has been a success, as it exposes a lot more insight from the JVMs monitored. The packaging of the Jobs however has so far not been well handled in a sense that in the current UI the transactional services and scheduled jobs look similar and are exposed together. This creates confusion, as the scheduled jobs are completely different from the transactional services.

To solve the problem, Plumbr now segregates the transactional services from scheduled jobs. With the change, the direct user impact to user experience via transactions is more clear and the visibility to scheduled jobs is preserved.

Change in the JVM concept.

With the introduction of applications, JVMs got somewhat demoted. For example, JVMs no longer expose users accessing the JVM nor services published from the JVM. This information is now present in the application view instead.

JVMs are still present and still expose valuable information, such as jobs launched in the JVM, root causes detected in the JVM and the technical telemetry harvested from the particular JVM.

Downtimes

The concept of downtime got redesigned in different aspects:

The new concept introduced is application downtime. This effectively replaces the current JVM downtime, but is more powerful as it supports clustered environments and does not create false positive alerts from dynamically scaled environments. The application downtime event is created when all the JVMs accepting inbound traffic in the application are down. So in case when you have three clustered nodes servicing the inbound traffic and one of the nodes is taken down for maintenance, no application downtime is registered. Only when all the three nodes are down, application downtime is registered and alerts created.

Individual JVM downtime is now a non-event, until the downtime starts impacting user experience or causes application downtime. When the JVM is the entry point for inbound traffic, the JVM downtime can cause the application downtime, as described in previous section. When the stopped JVM is not an entry point and is servicing the transactions in the back-end, the downtime is not explicitly exposed until it starts impacting end user experience.

Alerts

Alerts sent out by Plumbr got redesigned to accommodate other changes. First and foremost, you can no longer register alerts to individual JVM downtime events. Instead, a concept of application downtime is an equivalent source for alerts.

Introduction of the application also made it possible to subscribe to alerts from particular application only. For example, use cases like the following are now possible:

As a product owner, I want to receive alerts whenever any of the services in the “Production Billing” application are not performing as expected.

As a member of the Q/A team, I want to receive alerts to a slack channel “#billing QA” whenever the “Test Billing” services are not performing as expected.

Minor changes in alerts include the removal possibility of subscribing to alerts from a single service. The feature was almost not used, so we removed it. In addition, alerts now communicate the number of users impacted, besides the impact in transactions.

Take-away

If you are currently using Plumbr, we are encouraging to start exploring the more powerful new Plumbr available at https://app.plumbr.io. Log in using the same credentials as you are used to so far. Pay attention to the fact that this version of our product is compatible only with Agents 16.10.03 and newer, so you might need to go through the Agent version upgrade to benefit from all the new goodies.

If you have not yet used Plumbr to monitor your JVM-based applications, go ahead and grab your free trial. Make the end user experience transparent and start fixing the root causes for poor performance already today!