Daniel Mitterdorfer

I work for more than two years at Elastic where I focus on benchmarking, performance analysis and performance tuning for Elasticsearch. In my daily work, I have to deal with many people: our users, various of our engineering teams, support, consulting, marketing, sales, product management and our developer relations team. As you might imagine, these are a lot of balls to juggle.

Structuring Work

Often, I get requests that I cannot immediately work on. Therefore, I need a way to organize the incoming stream of tasks. I use a variation of Personal Kanban that I have implemented in Trello.

You can see several columns (from left to right):

Backlog: I will not work on these in the immediate future.

Next: Ready for work. I expect to start working on them soon.

Doing: I am actively working on these items at the moment.

Blocked: I started working but I need to wait for some external event to happen (e.g. waiting for somebody else to finish a task)

Done this week: This helps me to see progress.

Handling Incoming Tasks

I think the structure is pretty simple to grasp but another important part is how I interact with the board.

There are several streams from where I get new tasks:

I get an idea for a new feature, an improvement, a blog post etc..

A user has a question which requires research.

Some of my colleagues approaches me on Slack with a request or a question.

A new ticket is raised on Github (and I intend to work on it soon).

A benchmark run reveals an interesting result and I need to investigate the cause.

Whenever this happens, I create a new Trello card. Depending on the urgency, the card either goes into the backlog, the "Next" column or I add it to "Doing" and start to work on it immediately. Thanks to my external memory I will never forget anything, even if I will not work on it for a long time.

Suppose, for example that I get an idea for a feature that I could demo for Rally (our macrobenchmarking tool for Elasticsearch):

As you can see from the screenshot, a card does not only consist of its title. Using comments helps me better organize my thoughts around that topic and document areas that I need to look into.

There are several more features in Trello that I use, e.g.:

Due dates: are useful for tasks that have a hard deadline (like submitting proposals for conference talks).

Checklists: help when a task has many more subtasks.

Progress Reporting

At the end of each week, we send a brief update to the rest of the team for which I have setup a notification. I then kick off a script that reads my board and provides a draft report within seconds. Before sending, I just have to prune the list so it includes only tasks that are relevant for the rest of the team. Finally, I archive all cards in the "Done this week" column. That whole process takes at most one minute per week. Another important aspect: Due to externalizing everything again, I don't need to spend any brain cycles on it.

Why is main() not showing up in the profile? The reason is that Xprof is a "flat" profiler. We are normally used to hierarchical profilers that show complete stack traces. However, this profiler will only show the top-most method to avoid the overhead of stack-walking. As this example is only very simple, this is not immediately clear, but we'll see this in a more complex example later.

If you pay close attention, you'll see that the number of ticks does not match the reported time period. For 45 seconds, it should capture 4,500 samples, given a sampling frequency of 10 milliseconds. A quick experiment (just running Thread#sleep()) shows that the sampling interval (on my machine) is actually 12 milliseconds.

Sleep duration [ms]

Total samples

5,000

421

10,000

830

20,000

1,665

40,000

3,328

Before we move on, let's see what additional information we can get from Xprof.

And finally we can get even low-level info about hot methods with -XX:+ProfilerRecordPC:

Printing compiled methods with PC buckets having more than 15 ticks
===================================================================
[CodeBlob (0x000000010c4b7890)]
Framesize: 8
Could not load hsdis-amd64.dylib; library not loadable; PrintAssembly is disabled

Apparently we need the Java disasssembler library hsdis, which you need to build yourself (see the build instructions for your platform in the JITWatch wiki). With: -XX:+ProfilerRecordPC and export LD_LIBRARY_PATH=/path/to/hsdis we get indeed some assembly. If you prefer Intel assembly syntax instead of AT&T, just add -XX:+UnlockDiagnosticVMOptions -XX:PrintAssemblyOptions=intel.

Remember that this shows a flat profile. So every method printed here was at some point in time the top-most method on the stack. We do not see anything about the caller-callee relationship of these methods.

We can see that the profiler differentiates between interpreted, compiled and JVM stub samples. For an application in steady state, we'd expect the majority of samples in compiled code (which is the case here). Most samples also show Lucene, which makes sense considering that Lucene is responsible for indexing the data in the end.

When we terminate the application, we'll see a global summary. Also the bytecode ticks summary shows some statistics now:

Summary

Due to its low overhead (it only captures a flat profile so there is no need to walk the stack) you can enable it in production

It is already available on every (HotSpot) JVM

However, I think the lack of information about it on the Internet is telling. In my opinion it has serious disadvantages:

You can neither activate it on an already running JVM, nor can you force that captured samples are dumped.

There is no flag that allows you to write the output to a dedicated file.

It only provides a very limited view: output is dumped per-thread and the profile is only flat, so you'll have no clue which code paths called a hot method.

I'd always prefer a "proper" profiler like Java Flight recorder, Honest Profiler or Yourkit during development. If you need to quickly analyze a problem in production, periodically taking thread dumps is usually helpful (either manually or automated, e.g. with stcap). An alternative to the latter may be -Xprof, especially if you are in an extremely limited environment and want to quickly get at least some information out of your application.

I admit it: I use noscript, I block trackers like Google Analytics and all sorts of ads. If you browse the web this way, you see strange things: in the best case layouts are completely broken but in some cases pages even do not show up. And no, I am not talking about interactive pages like Google Maps. I am talking about pages that are content heavy. To me these people are just downright sloppy.

As a counter-statement I have eliminated most waste on my own page now ("the one who is without sin is the one who should cast the first stone"). From today on, this page does NOT use:

Javascript

Cookies

Page tracking

Categories

Comments

What is left? The essentials: HTML, CSS and images. There are some old presentations that I made with reveal.js but the rest of this page will not see Javascript any time soon.

The other day I had to analyze search latency spikes in Elasticsearch for a customer. When latency spikes are involved, one of the first things I want to know is the "noiseness" of the host where the problem occurs.

A great tool that helps with this analysis is Gil Tene's jHiccup. It records hiccups in the application and also in a control process. One thing that bugged me about jHiccup for a long time is that you need to Microsoft Excel to visualize the data. However, I don't own an MS Office license and as much as I like jHiccup, I won't buy a license just for that.

So I thought about alternatives. At Elastic, we store our benchmark results also in Elasticsearch and visualize the data in Kibana. As I wanted to analyze multiple data sources anyway, it made sense to import the data into Elasticsearch. jHiccup comes with a tool, called jHiccupLogProcessor that can convert jHiccup files to CSV. After a few unsatisfying experiences with a few CSV to JSON converters, I hacked together a small Python script that can convert just jHiccup's CSV file format to Elasticsearch bulk requests and imported the file with curl.

This approach worked but is also cumbersome. Apart from that I figured other people might benefit from an import tool to Elasticsearch too. So I spiced up the script a little bit and the result is hiccup2es. Although it is really basic, it is a start and allows you to import data into Elasticsearch in one step. It even creates the index for you.

It is quite easy to use (I hope). Suppose you have a CSV jHiccup log called "hiccuplog.csv" that has been created by jHiccupLogProcessor. Then you can import the data into your local Elasticsearch cluster in one step:

python3 hiccup2es.py --input-file=hiccuplog.csv --create-index

That's it! hiccup2es assumes a lot of defaults, so if you want to know what you can configure, run hiccup2es --help to find out.

After the data are imported, you can easily create a nice looking visualization in Kibana. Here is a hiccup log during a benchmark I ran on my local machine:

jcmd is one of those neat tools that only a few people seem to know about. It is a command line tool that is shipped with each Oracle JDK installation and provides basic information about a running Java process, which is practical for unobtrusively inspecting a running production instance.

How to Use jcmd

To use jcmd we need to specify the process id of a running Java process. When jcmd is invoked without any arguments, it prints a list of all running Java processes:

The supported options vary from process to process (e.g. due to VM options). For this example, there are four categories of commands:

JFR: These are commands that control Java Flight Recorder which is really great for monitoring an application and deserves its own post (or even a series of posts). Note that while you are allowed to use Java Flight Recorder during development you need to obtain a separate license from Oracle to use it on production systems 1.

VM: Basic information about the running JVM instance

GC: Basic information about GC behavior and commands to force a GC or finalization

Others: help, options to control the Java management agent and taking thread dumps

There is also one option that is not listed here, called "PerfCounter.print". It prints JVM-internal performance counters which contain lots of details on the behavior of components like the classloader, the JIT, the GC and a few others. To get started, the commands listed above should do but if you are really curios just try it.

A Practical Example

Suppose an application takes ages to start on a production system although it worked fine in development. To determine what's going on, we can take a thread dump:

Can you see what's going on? The main thread is in TIMED_WAITING state which means it is sleeping. Although this is not a very realistic example, such a thread dump once helped me find a weird startup issue in production: The logging framework of an application was misconfigured to log everything on DEBUG level but without specifying one of our application's file appenders. Therefore, no log statement ever reached the application's log files but it seemed that the application was not doing anything for minutes. I took a few thread dumps which revealed the problem quickly: Most thread dumps revealed that a debug statement was issued deep in library code. We corrected the log configuration and the application started smoothly again.

Summary

jcmd supports basic inspections in limited environments such as a production machine. Sure, there are more sophisticated tools like jconsole, Java Mission Control or third-party tools like AppDynamics. However, for a quick glimpse jcmd is quite handy as it comes with every JDK installation and requires no graphical interface.