Java tips, observations, bugs and problems from the world of Spring, Weblogic, Oracle, MySQL and many other technologies...

Wednesday, 27 November 2013

Investigating Memory Leaks Part 1 - Writing Leaky Code

I found this little problem the other day: there’s this server that runs for a while and then falls over. It’s then restarted by its startup script and the whole process repeats itself. This doesn't sound that bad as it isn't business critical although there is a significant loss of data, so I decided to take a closer look and to find out exactly what's going wrong. The first thing to note is that the server passes all it's unit tests and a whole bunch of integration tests. It runs well in all test environments using test data, so what's going wrong in production? It's easy to guess that in production it's probably under a heavier load than test, or than had been allowed for in its design, and therefore it's running out of resources, but what resources and where? That's the tricky question.

In order to demonstrate how to investigate this problem, the first thing to do is to write some leaky sample code and I'm going to use the Producer Consumer pattern to do this because I can demonstrate the big problem with it.

To demonstrate leaky code1 I need, as usual, I need a highly contrived scenario and in this scenario imagine that you work for a stockbroker on a system that records their sales of stocks and shares in a database. Orders are taken and placed in queue by a simple thread. Another thread then picks up the order from the queue and writes it to the database. The Order POJO is very straight forward and looks like this:

The second class is OrderRecord, which is responsible for taking orders from the queue and writing them to the database. The problem is that it takes significantly longer to write the orders to the database then it does to produce them. This is demonstrated by the long, 1 second, sleep in my recordOrder(…) method.

/**
* Record the order in the database
*
* This is a dummy method
*
* @param order
* The order
* @throws InterruptedException
*/public void recordOrder(Order order) throws InterruptedException {TimeUnit.SECONDS.sleep(1);}

}

The result is obvious: the OrderRecord thread just can't keep up and the queue will get longer and longer until the JVM runs out of heap space and falls over. That's the big problem with the Producer Consumer pattern: the consumer has to be able to keep up with the producer.

Just to prove his point I've added a third class, OrderMonitor, which prints the queue size every couple of seconds so that you can see things going wrong.

The one thing that I really hate about Java is that fact that it's SO difficult to run any program from the command line. You have to figure out what the classpath is, what options and properties need setting and what the main class is. Surely, it must be possible to think of a way of simply typing Java programName and the JVM figures out where everything is, especially if we start using convention over configuration: how hard can it be?

You can also monitor the leaky app by attaching a simple jconsole. If you're running it remotely, then you'll need to add the following options to the command line above (picking your own port number):

…and if you take a look at the amount of heap used you’ll see it gradually increasing as the queue gets bigger.

If a kilobyte of memory leaks away, then you’ll probably never spot it; if a giga byte of memory leaks the problem will be obvious. So, all that’s left to do for the moment is to sit back as wait for some memory to leak away before moving on to the next stage of the investigation. More on that next time...