OpenTracing with Platform Instrumentation for Enhanced Visibility

Distributed tracing is an extremely important component when using micro-services based architectures. Moving to micro-services based architecture brings unique challenges w.r.to visibility across services and this is where distributed tracing is very helpful. The primary focus for distributed tracing has been application specific instrumentation – like latency of the function calls etc.

There are scenarios where a combination of application and platform specific instrumentation data is of immense help to definitively identify issues. For example, in the case of memory-intensive workload, there could be a scenario where memory capacity is available, however, memory bus bandwidth might be starved, thereby negatively affecting the performance. Having this data available in the context of application will be immensely helpful in identifying the root cause of the performance issue.

Our initial work was primarily around making the platform instrumentation data available to the orchestration engines for optimal placement decisions. You can read more about our previous work here – https://goo.gl/ZU9d9V

Recently our focus has been to add support for platform instrumentation data in distributed tracers.

Based on few internal prototypes, we approached the OpenTracing community with our idea and today we have the initial support of platform metrics in OpenTracing and Zipkin backend.
Kudos to Hemant Shaw, who worked with the community to get this done. And of-course a big thanks to the maintainers, namely Yuri Shkuro, Ben Sigelman and Bas Van Beek, for their help in making this happen.

The following diagram gives an overview of the components involved when using OpenTracing with Zipkin backend.

At a minimum, these steps are required to enable capturing of platform instrumentation data in your application with OpenTracing. Currently only ‘golang’ applications are supported. Pull requests are welcome to enable support for other languages.

The application trace with platform data looks like the following in Zipkin:

My environment consists of a heterogeneous Docker swarm cluster consisting of Intel (x86_64) and Power (ppc64le) nodes. I have taken the sock-shop microservices demo application and modified the ‘catalogue’ service to include collection of platform instrumentation data.