Micro-Tuning Step-by-Step

Micro-tuning is a term often used to mean speeding up small sections of code out of context, by profiling and analyzing that code and using some of the many techniques available to make code run faster.

In contrast, macro-tuning looks at the program in context, and tries to improve performance by altering the algorithms, data structures, or interactions between components or subsystems. There is not always a clear-cut distinction between these tuning methodologies; micro-tuning is often a part of macro-tuning and the alteration of algorithms and data structures is also a normal part of micro-tuning. In this article, I will step through micro-tuning a short method, using some standard techniques.

The Problem

A colleague of mine recently reminded me about an old discussion in one of the performance forums. The originator of the discussion was talking about a comparison between two programs with the same functionality, one written in C and one in Java. The full program was never given, but one function was provided:

Is an integer. (If it is not, the NumberFormatException exception gets thrown and caught, and the catch block returns false.)

Is not the empty string.

Is an integer greater than 10.

Is an integer between 2 and 100,000, inclusive.

Has a first digit of 3.

It looks like a silly test, and the implementation is clearly a naive port into Java. Nevertheless, my colleague wondered just how much it could be optimized. Optimizing this test is an interesting exercise. Anyone who has done even a small amount of performance tuning in Java will immediately see several huge optimizations they could apply. However, I'm going to proceed as if I can't immediately see any potential optimizations, and work out what I can do to speed up the method. Those of you who like a challenge might like to, before proceeding with the article, write down the optimizations you expect will get applied, and see if you are proved right by the end of the article.

The actual optimizations applied are quite dependent on the data. The data for the test was never discussed, so I'll use several data sets, to give you an idea of how performance is dependent on data. I'll use one data set with numbers that would only return true from the method, a second data set with numbers that would return true and false equally often, and a third data set that includes non-number elements.

Where to Start

Micro-tuning should always start with a baseline measurement. The most basic tool in the tuner's armory is the System.currentTimeMillis() method, which returns the current time in milliseconds, as a long data item. This method returns the time offset from some starting time, but we don't actually care what that starting time is because we will use the currentTimeMillis() method to measure a time difference, the elapsed time between the start and end of our test:

long time = System.currentTimeMillis();
checkInteger("34567");
time = System.currentTimeMillis() - time;
System.out.println(
"The time taken to execute the checkInteger method was "
+ time + " milliseconds");

When executed, this measurement says that the method took zero milliseconds. The method runs fast enough that the resolution of the timer we have available is not fine enough to give a reliable or useful measurement. In practice, the method must have been previously identified as a bottleneck in the application (presumably because it is called very often), otherwise there would be no further need to tune the method.

For our micro-tuning exercise to proceed, we will need to repeat the test a sufficient number of times to make it measurable. It is important that the repeated measurement is representative of how the method is used in the application, i.e., that the data used to repeatedly test the method represents the data used in the application. Many methods execute at different speeds depending on the data passed to them, and our method is one such method. Obviously, that is not clear from the first measurement above. But we can see that, for example, the method first converts a string into an integer and the time taken for that task will inevitably depend on the number of characters in the string. Any general-purpose conversion algorithm should be faster converting string "1" into the integer 1, than converting the string "789123456" into the integer 789123456, because of the number of characters that need to be iterated over. Our method also has other speed variations that depend on the data; for example, different integers will cause different numbers of tests to be performed by the method.

As we discussed earlier, I'll use three data sets to obtain our baseline:

Dataset 1: All strings return true.

Dataset 2: 50% of strings return true, and 50% return false.

Dataset 3: One third of strings return true, one third represent numbers which return false, and one third represent non-numbers which will also return false.

When taking measurements, I like to repeat test runs three times and take average results. I usually make the repeat factor in the individual looped tests high enough so that I'm measuring at least a few seconds. I also make sure that there are no other significant processes running on the computer, and that I leave the test in the foreground the whole time it is running. These steps minimize the possibility that CPU time will be allocated to anything other than the test while it is running, making the test results more consistent. In addition, I personally prefer to use normalized results: I divide all result measurements by the first result and multiply by 100 to give a percentage result. Naturally, this means that the first result is always 100%. My baseline results are shown in Table 1 (all tests were run with SDK 1.4.0 JVM in server mode).

Table 1: Baseline measurements

Dataset 1

Dataset 2

Dataset 3

baseline

100%

84.1%

540.0%

As well as giving us a baseline, these measurements immediately produce something interesting: the Dataset 3 result is much larger than the other two results. We'll follow up on this point shortly. Baseline measurements normally show some useful information, if only to provide the average execution time for one call to the method. That average, multiplied by the number of iterations you expect to execute when running the application for real, gives you an idea of how expensive the method currently is. For example, if a million iterations take one second, and you expect 100,000 calls to the method every second at peak application usage, then this one method already takes one tenth of the time available to execute the method and all other methods. This may provide sufficient performance, or it may be a serious bottleneck, depending on the application.