Intro

SAP CPI provides certain tooling that is helpful when investigating some aspects of performance issues that can happen with an integration flow – in particular, verbose logging of integration flow steps, which is not only capable of collecting details of each step execution, but also visualizing the entire executed path and steps’ sequence on an integration flow diagram. This is useful, if we need to check steps that are the most time consuming, as log records contain information on start and end time of each step execution. In script steps, the developer can utilize Message Processing Log API and complement standard log entries with custom logs, where information on major milestones of script logic – including some insight into performance of those logical blocks, based on timestamp recording – can be provided.

There are cases when logging is not sufficient for an integration flow performance issues analysis and troubleshooting – such as when additional log entries cannot be added programmatically or when more precise investigation of runtime is required. Examples can be standard integration flow steps, or nested calls invoked from script steps.

What tools would be commonly used in such cases in on premise hosted Java applications? Collection of series of thread dumps, followed by inspection of snapshots of runtime behavior of the application and observation of how threads performed over time, is nice technique to start from. If more detailed information based on continuous runtime inspection is required, then sampling and profiling techniques become handy.

What about SAP CP hosted Java applications? Collection of thread dumps is possible there using JMX Console, profiling is possible using Java Profiling service in Neo environment. Then, what about CPI? Its core integration services are based on Java framework and are executed in JVM runtime, and it is deployed in Neo environment. Above techniques – collection of thread dumps and profiling – are not yet possible using Web UI of CPI. Let’s see how we can address this and mitigate to some extent, and get helpful insight into an integration flow at its runtime – namely, trigger thread dumps programmatically and inject this logic into an integration flow.

Overview

In context of CPI runtime nodes, we are limited in ways to access JVM native interfaces – in particular, we can execute custom code within JVM that will make use of them (for example, by means of custom scripts in integration flows), but we cannot attach to running JVM of a runtime node using management interface such as JMX or JVM Tool Interface. Taking this into account, in this blog, I will focus on demonstration of the way how to invoke thread dumps from within an integration flow. Following approach is going to be used and described: within an integration flow, a custom script is created and placed before the flow steps for which we need to inspect behavior and collect thread dumps that will include those steps’ execution – script implements logic to create a new thread which will trigger series of thread dumps with the given number of thread dumps in the series and interval between thread snapshots.
It is important that thread responsible for triggering thread dumps, is created as a separate thread and is decoupled from an integration flow – meaning that custom script in an integration flow only defines what logic needs to be executed to trigger and record thread dumps, and then creates and starts a new thread that will execute defined logic.

In the demo, I will use a simple integration flow that is started by an HTTP request. For the same of brevity, the integration flow is pass through scenario (it doesn’t contain any logic except the script we focus on) that calls a custom web-based application on receiver side. The called application intentionally implements delayed response, so that we can simulate end to end delay caused by long request execution on the called backend application. In real life scenarios, delay can originate from variety of flow steps – execution of complex script steps, execution of mapping steps against large messages, calls to remote systems, etc.

Collected thread dumps are recorded to runtime node’s log (don’t get confused – I’m not using message processing log here) in plain text.

To make usage of the integration flow more convenient, a router step has been added, so that based on custom header passed in the request call, it can be decided, if the integration flow shall be executed following normal path, or if it needs to follow route that invokes the script and triggers thread dumps. That can turn to be helpful, if you are testing the integration flow and need to make thread dump collection feature conditional:

As the demo is based on HTTP request as a flow trigger, I have chosen custom HTTP headers as a mechanism for passing number of thread dumps to be collected and interval between thread snapshots. For example, the below request invokes collection of series of 5 thread dumps, triggered every 2000 ms (2 seconds):

Depending on a particular use case and integration flow configuration, the way and form how these parameters are delivered to an integration flow and are utilized by the script, can vary. So can vary destination for storage of collected thread dumps – instead of the log, alternative options can be used, too.

Implementation and collection of thread dumps

ThreadMXBean (which is a part of management interface for JVM) is used to retrieve snapshot of all currently alive threads and information about each of them. Periodically repeated invocation of ThreadMXBean.dumpAllThreads() help us to retrieve series of such snapshots. In this way, we mimic logic that is be commonly called by externally attached JVM admin and monitoring tools – such as JMX Console.

Next, it is required to place this piece of logic into a separate thread and start that thread from the script. This is achieved by using generic Java approach: new thread is created, implementation of logic that needs to be executed by the thread (which is here – invocation of thread dumping and recording of thread dumps in the log), is inserted into its method run(), and finally new thread is started – which implicitly calls implementation of its method run().

In the presented demo, I use implementation based on anonymous class embedded directly in the method processData() of the script, as logic is not complex and implementation is readable. In more complex and advanced scenarios, it is highly recommended to review and refactor this in such a way that Runnable implementations are decoupled and kept separate from script’s message processing core method.

Now it is time to navigate to CPI Monitoring view, go to System Log Files (this view can be also accessed via following URL path in the CPI tenant: /itspaces/shell/monitoring/SystemLogs) and retrieve generated log – which is essentially default trace:

After cleanup, the formatted log looks like the one below:

As it can be seen from the screenshot, we got thread dumps and are set to analyse them. Almost, but not completely yet. Note that stack traces of threads only show last 8 elements (stack frames) for each thread. While it might be negligible in some cases, there are cases where we need to explore the whole (or major part of) call stack of the thread, and not only most current frames that were put into thread’s stack – highly nested calls (driven by complex sequence of calls involving different layers of the application) and functions with deep level of recursion are few good reasons for this.
If your case is amongst those when you are satisfied with topmost 8 stack frames per thread call stack and this limitation doesn’t cause issues for thread dump interpretation and analysis, remaining part of the blog is unlikely to bring value, but will instead introduce unnecessary complexity. But, if full stack trace is required, then let’s look how we can address this issue and eliminate limitation.

The mentioned limitation comes from the attempt to stringify content of thread info when appending it to a string builder:

threadMXBean.dumpAllThreads(true, true).each {
builder.append(it)
}

Method ThreadInfo.toString() in Java SDK is implemented in such a way that it sets hard limit of stack frames that are produced in an output, and this limit is defined by private constant that is equal to 8. To overcome this limit, we can create our own implementation of the method toString() that will be used for thread info stringification, where custom implementation will pass all available stack frames of the thread to output. Custom toString() implementations are fairly common in Java world, so I will not go into details of this approach – instead, I will replace initially used method with part of original implementation of ThreadInfo.toString() (full implementation of this method produces more verbose output – here, I only used part of it for illustration purposes), slightly modifying and adjusting it to utilize Groovy capabilities and making implementation more compact. As a result, code snippet provided above, becomes replaced with the one below:

Now thread dumps contain full call stack for each alive thread – exactly what we were looking for!

Preparatory work – collection of thread dumps – is over by now and concludes the blog. In real life scenario, this is merely a starting point for analysis that shall give insight into what node’s JVM executed at a time of thread dumps collection and reveal any possible performance issues related to specific threads or applications that created them.