When you are developing a Java EE application with certain performance requirements, you have to verify that these requirements are fulfilled before each release. An Hudson job that nightly executes a bunch of test measurements on some specific hardware platform is what you may think about.

You can check the achieved timings and compare them with the given requirements. If the measured values deviate from the requirements too much, you can either break the build or at least send an email to the team.

But how do you measure the execution time of your code? The very first thought might be to add thousands of insertions of time measuring code into your code base. But this is not only a lot of work, but also has an impact on the performance of your code, as now the time measurements are also executed in production. To get rid of the many insertions you might want to leverage an aspect oriented framework (AOP) that introduces the code for time measurements at compile time. Using this way you have at least two versions of your application: the one with and the one without the additional overhead. To measure performance at some production site still requires a redeployment of your code. And you have to decide which methods you want to observe already at compile time.

Java EE comes therefore with an easy to use alternative: Interceptors. This is were the inversion of control pattern plays out its advantages. As the Application Server invokes your bean methods/web service calls, it is easy for it to intercept these invocations and provide you a way of adding code before and after each invocation.

Using interceptors is then fairly easy. You can either add an annotation to your target method or class that references your interceptor implementation or you can add the interceptor using the deployment descriptor:

Before the try block and in the finally block we add our code for the time measurement. As can be seen from the code above, we also need some in-memory location where we can store the last measurement values in order to compute for example a mean value and the deviation from the mean value. In this example we have a simple ring storage implementation that overrides old values after some time.

But how to expose these values to the world outside? As many other values of the Application Server are exposed over the JMX interface, we can implement a simple MXBean interface like shown in the following code snippet:

Now we can start the jconsole and query the exposed MXBean for the mean value:

Writing a small JMX client application that writes the sampled values for example into a CSV file, enables you to later on process these values and compare them with later measurements. This gives you an overview of the evolution of your application’s performance.

Conclusion: Adding dynamically through the deployment descriptor performance measurement capabilities to an existing Java EE application is with the use of interceptors easy. If you expose the measured values over JMX, you can apply further processing of the values afterwards.

When you design a new API you have to take a lot of decisions. These decisions are based on a number of design principles. Joshua Bloch has summarized some of them in his presentation “How to Design a Good API and Why it Matters”. The main principles he mentions are:

Easy to learn

Easy to use

Hard to misuse

Easy to read and maintain code that uses it

Sufficiently powerful to satisfy requirements

Easy to extend

Appropriate to audience

As we see from the list above, Joshua Bloch puts his emphasis on readability and usage. A point that is completely missing from this listing is performance. But can performance impact your design decisions at all?

To answer this question let’s try to design a simple use case in form of an API and measure its performance. Then we can have a look at the results and decide whether performance considerations have an impact on the API or not. As an example we take the classic use case of loading a list of customers from some service/storage. What we also want to consider is the fact that not all users are allowed to perform this operation. Hence we will have to implement some kind of permission check. To implement this check and to return this information back to the caller, we have multiple ways to do so. The first try would look like this one:

Here we model an explicit exception for the case the caller has not the right to retrieve the list of customers. The method returns a list of Customer objects while we assume that the user can be retrieved from some container or ThreadLocal implementation and has not to be passed to each method.
The method signature above is easy to use and hard to misuse. Code that uses this method is also easy to read:

The reader immediately sees that a list of Customers is loaded and that we perform some follow-up action only in case we don’t get a PermissionDeniedException. But in terms of performance exceptions do cost some CPU time as the JVM has to stop the normal code execution and walk up the stack to find the position where the execution has to be continued. This is also extremely hard if we consider the architecture of modern processors with their eager execution of code sequences in pipelines. So would it be better in terms of performance to introduce another way of informing the caller about the missing permission?

The first idea would be to create another method in order to check the permission before calling the method that eventually throws an exception. The caller code would then look like this:

The code is still readable, but we have introduced another method call that also costs runtime. But now we are sure that the exception won’t be thrown; hence we can omit the try/catch block. This code now violates the principle “Easy to use”, as we now have to invoke two methods for one use case instead of one. You have to pay attention not to forget the additional call for each retrieval operation. With regard to the whole project, your code will be cluttered with hundreds of permission checks.

Another idea to overcome the exception is to provide an empty list to the API call and let the implementation fill it. The return value can then be a boolean value indicating if the user has had the permission to execute the operation or if the list is empty because no customers have been found. As this sounds like C or C++ programming where the caller manages the memory of the structures that the callees uses, this approach costs the construction of an empty list even if you don’t have a permission to retrieve the list at all:

One last approach to solve the problem to return two pieces of information to the caller would be the introduction of a new class that holds next to the returned list of Customers also a boolean flag indicating if the user has had the permission to perform this operation:

Again we have to create additional objects that cost memory and performance, and we also have to deal with an additional class that has nothing more to do than to serve as a simple data holder to provide the two pieces of information. Although this approach is again easy to use and creates readable code, it creates additional overhead in order to maintain the separate class and has some kind of awkward means to indicate that an empty list is empty because of the missing permission.

After having introduced these different approaches it is now time to measure their performance, one time for the case the caller has the permission and one time for the case the caller does not have the necessary permission. The results in the following table are shown for the first case with 1.000.000 repetitions:

Measurement

Time[ms]

testLoadCustomersWithExceptionWithPermission

33

testLoadCustomersWithExceptionAndCheckWithPermission

34

testLoadCustomersWithReturnClassWithPermission

41

testLoadCustomersWithListAsParameterWithPermission

66

As we have expected before, the two approaches that introduce an additional class respectively pass an empty list cost more performance than the approaches that use an exception. Even the approach that uses a dedicated method call to check for the permission is not much slower than the one without it.
The following table now shows the results for the case where the caller does not have the permission to retrieve the list:

Measurement

Time[ms]

testLoadCustomersWithExceptionNoPermission

1187

testLoadCustomersWithExceptionAndCheckNoPermission

5

testLoadCustomersWithReturnClassNoPermission

4

testLoadCustomersWithListAsParameterNoPermission

5

Not surprisingly the approach where a dedicated exception is thrown is much slower than the other approaches. The magnitude of this impact is much higher than one would expect before. But from the table above we already know the solution for this case: Just introduce another method that can be used to check for the permission ahead, in case you expect a lot of permission denied use cases. The huge difference in runtime between the with and without permission use cases can be explained by the fact that I have returned an ArrayList with one Customer object in case the caller was in possession of the permission; hence the loadCustomer() calls where a bit more expensive than in case the user did not possess this permission.

Conclusion: When performance is a critical factor, you also have to consider it when designing a new API. As we have seen from the measurements above, this may lead to solutions that violate common principles of API design like “easy to use” and “hard to misuse”.

Inspired by Tomasz Nurkiewicz’ blog post about how aggressive the inlining capability of the Java Virtual Machine is (original blog post), I asked myself what impact the refactorings to the famous GeneratePrimes class in Robert C. Martin’s book ‘Clean Code’ (see page 71) are. Therefore I set up a small benchmark project that can be found here on GitHub: https://github.com/siom79/generate-primes-cleancode-benchmark.

Under src/main/java you will find the two classes as they are published in the book and under src/test/java a unit test that runs the generatePrimes() method of the two classes with different values for the argument maxValue, ranging from 10 to 1.000.000.000. The unit test prints out the measured time in milliseconds like this:

The logging statement beginning with OneMethod is the old-styled implementation which computes all prime numbers up to the given maximum value with only one method, whereas the statement beginning with PlentyMethods uses the refactored version using plenty methods for the implementation of the same algorithm. As you will notice, the difference between both implementations converges against 1.00. This means that after some time all private methods of the refactored implementation have been inlined and do not cause any runtime overhead. Until 100.000 the one method implementation is faster, afterwards (interestingly except for the 1.000.000 measurement) the difference in runtime is 1. Surprisingly the first measurement shows up that the plenty method implementation is even faster than the old-styled implementation.