Where it all started

Recently, I was investigating a performance issue in a JRuby application, namely AsciidoctorJ with the Asciidoctor diagram extension enabled.
The same code was almost two times slower in JRuby 9 than in JRuby 1.7.

Dominic, who reported this issue, has created a short-living application to reproduce it.
The application was written in Clojure and was able to generate a flame graph for a particular function:

If you are not familiar with Clojure, don’t worry, the important thing to remember is that this application can generate a Flame graph for a given function.
Here, the function is execute 100 times readDocumentHeader.

I was really impressed because typically if you want to generate a Flame graph, you will have to:

package your application,

attach an agent or start a profiler for a given pid,

and then convert the raw output to a nice SVG visualization

Here we can just run the application and it will generate a Flame graph for a given function.
Don’t get me wrong, using the technique described above is working great but in this case it’s not really practical because we don’t want to include the warm up, and the application is not running for long.

Please note that clj-async-profiler is only supported on GNU/Linux and MacOS.
If you are running GNU/Linux, you need to do ensure that perf_event_paranoid is set to 1:

cat /proc/sys/kernel/perf_event_paranoid

If it prints 2, it means you are running a newer Linux kernel.
You will have to set it to 1 to allow async-profiler to use kernel profiling data:

$ echo 1 | sudo tee /proc/sys/kernel/perf_event_paranoid

This library is a really convenient wrapper around the excellent async-profiler and FlameGraph.
With this library, you can just call a function in your application and it will generate a Flame graph!

From Clojure to Java

While I was trying to pinpoint the root cause of performance issue, I though maybe it could be an interaction between Clojure dynamic classloader and JRuby.
In order to remove this assumption, I decided to write the same application in Java to compare the two Flame graphs.

When running this application, it will output the path to the generated SVG, for instance:

/tmp/clj-async-profiler/results/flamegraph-2019-02-25-11-56-25.svg

You can then open this file in your browser.

Final outcome

In the end, Clojure had nothing to do with this performance issue.
We’ve discovered that the root cause was actually an overhead in the JRuby Kernel#require_relative method: github.com/jruby/jruby/issues/5618.
We’ve also made a few changes in the Asciidoctor diagram extension to call require_relative less frequently.

Now AsciidoctorJ with the Asciidoctor diagram extension enabled is faster on JRuby 9 than on JRuby 1.7!
And we also learned in the process how to easily generate a Flame graph using clj-async-profiler 🍻