Don’t return from within finally

Earlier today, I was debugging some seemed to be some pretty straight-forward code. The code effectively was the following function. What would you expect the output of this class to be?

public static void main( String [] args ) {

System.out.println( foo() );

}

public int foo() {

try {

throw new RuntimeException();

} catch ( RuntimeException e ) {

System.err.println( "Caught" );

throw e;

} finally {

System.err.println( "Finally" );

return 1;

}

}

If you expected an exception to be percolated out as I was, you are wrong. It actually turns out that the above returns 1 and ignores the exception that was just raised. Just something subtle to keep an eye out for.

How do you quickly sort 84 million unique Integers?

For the last few weeks, I’ve been working on increasing the performance of various areas at work and one of the areas that kept on hitting my profiles has been sorting integers. All of us with a Computer Science degree have sorted integers before and we know how to cost algorithms with Big-O notation, but most of the time, the default sort in Java, C#, and other languages is “good enough” that we don’t really think of it.

When it starts to show up on a profile though, sometimes that is time to start thinking about it. As with many things, algorithms based on N usually perform well when N is reasonably sized, so you do not always notice the issue until N becomes interesting or the number of times you call the algorithm increases.

Whenever diagnosing performance problems, it is important to characterize whether it is one or more big calls to the method that causes the entire time sink or whether it is the sheer number of calls to that method. In either of these problems, I think a reasonable first thing to ask is whether that call is really necessary or is it at the right location? More specifically, if you are not running binary searches or other algorithms or use-cases which require the values to be sorted, why sort them at all? If for no other reason, this causes you to think through how to code is or could be used and sometimes leads to a different solution.

In my particular case, there were a few large sorts that were taking all the time. Each of these sorts were sorting approximately 84 million integers via the JDK standard Arrays.sort() or the Colt IntArrayList.sort() method and taking approximately 10 seconds per sort. Depending on your background, 10 seconds may be an eternity or a dream, so suffice it to say that the entire processing required for this component had to be less than 3 seconds, approximately half of that had already been accounted for.

So assuming it is required, how do you sort faster? An obvious approach to server engineers is to multithread this operation, such as sorting chunks of that memory in parallel and then merging them together, or perhaps a multi-threaded version of quick-sort, and I am sure that my team expected me to go hide in a corner for a while and code up one of these approaches...

But I didn’t. You see, there’s something particularly interesting about the integers we were sorting. First, they are all positive, and second, there aren’t any duplicate integers. So instead of sorting them, can we do anything else with them that we can infer the sort and have more memory locality?

Well, one way to do it would be to use a set of bits. Say that the largest number in the array was 200 million, you can allocate 200 million bits and then sort the numbers using an algorithm like this:

for each N in array

set bit N

while more bits

find next set bit

set array[i] to that bit

increment i

Before we go further, you will note that the above algorithm goes through the array twice, and that may seem bad to you, however, there is another thing to keep in mind when assessing this algorithm, which is memory locality, and the fact that especially in the second loop, you are effectively iterating sequentially through two arrays which are probably in the L1 or L2 cache already, potentially making a big performance improvement if you have performance-grade processors.

The following sample code implements this in Java using the BitSet class. I’ve personally not gone through the code to see if it is the most optimal implementation for what I am doing with it, but after this change, the performance was in the acceptable range.

Performance across machines will vary. On my work servers with the latest Xeons, it averages around 800ms per sort, compared to an average of 10 seconds. On my home server, it is a little slower but averages around 9 seconds, verses 23 seconds for the QuickSort algorithm.

Now the above algorithm is not optimal for all sizes. In my tests, the magical point was somewhere around one million rows where the performance of the default Arrays.sort() and the above crossed over, so we do a quick size check and either go to the standard QuickSort or to the BitSet-based sorting.

Those familiar with the BitSet API may mention that I could gain better performance by telling the BitSet algorithm the number of bits I am interested in using and allowing it to pre-allocate the memory for that. Unfortunately though, I did not have that information available for most areas in the code and it was not feasible to guess in this area of the code. What was super interesting though is that doing a pre-scan of the entire array to find the maximum number did not change the runtime of the algorithm significantly. In fact, the TP99 of the two were off by around 100 milliseconds in my test. This could definitely be an area to improve the above algorithm going forward.

If the above algorithm was not fast enough for you still, there is still room for improvement. First, you can look at pre-allocating the array. Second, the second loop of the algorithm is easily parallelizable and perhaps the BitSet implementation for this use-case could be simplified.

As I mentioned above, the algorithm only works for unique, positive integers. If you have negative numbers, you could possibly use a second BitSet object for the absolute value of the negative numbers and iterate it in reverse (applying the sign obviously) which would also work. Duplicates obviously don’t work though, and if you have that problem, ask yourself if you really need the sort at all or whether you need the duplicates, and if you do, consider a parallel sorting algorithm.

Rounding numbers with Java’s NumberFormat...

Take a look at the following numbers and round them to the nearest whole number:

2

2.1

2.2

2.3

2.4

2.49

2.5

2.51

2.6

2.7

2.8

2.9

3

I’m pretty confident that you will get the same answers as my 4th grader, which is that the numbers from 1-6 result in a 2 and 7 onwards results in 3, right? Well, given the following code though, you’d be surprised:

Your surprise would come from the fact that the 2.5 rounds down to 2. Why? Well, reading the DecimalFormat javadocs closely, you will notice that the default rounding policy is HALF_EVEN, in which the above behaviour is explained. To change that, simply use format.setRoundingMode( RoundingMode.HALF_UP ); or one of the other policies available in the rounding mode.

Why this is the default behaviour is puzzling to me, but I’ll just add it to the list, which includes the Math.abs() handling of Integer.MIN_VALUE and non-thread-safe SimpleDateFormat (which, for the record, I do not believe that the code suggested in that blog post is guaranteed to be thread-safe either... since the DateFormat class is declared to be non-threadsafe also).

Internet Explorer 8 Compatibility Mode...

Earlier today, myscenicdrives.com released our road trip planner which I am really excited about. There are several more features that I want to build into it, but this was a really nice time to release it and get some feedback from real users.

For the last week, we have been doing some usability testing, optimizations, cross browser testing, and polishing it all up before the press release earlier today. With the final push going out the door late last night, we did our final round of testing and under Internet Explorer 8, I got that dreaded, “A problem displaying [site] caused internet explorer to refresh the webpage using compatibility view.” This resulted in a huge effort to figure out where that message from coming from.

The majority of the testing was done in the latest version of all browsers, which I was really pleasantly surprised at how standards compliant Internet Explorer 9 is. While I had to do a few tweaks specifically for it, it was relatively easy and no more than what I had to do for other browsers. When I cracked up Internet Explorer 8 (via Microsoft’s virtual machine image), it required a lot more tweaks but in the end, it seemed good.

This is why I was really surprised when I loaded up the page last night and saw the compatibility view message. Unfortunately, that message does not provide any insight into what the problem might be. The Microsoft documentation on the matter states that your pages must have a DOCTYPE and validate, but they also hint that there are other undocumented reasons — this makes finding the root of the problem orders of magnitude more frustrating.

After changing things I thought might be suspect, endless googles for any hints, and etc., I started commenting out various chunks of code trying to make it go. At the end, I had a simple empty page with nothing between the opening body tag and the closing body tag, and it was still happening! But this was illuminating because right above where the body tag came out was one little optimization we had done recently.

When usability testing, we noticed that there was a lag in images being displayed which was not user-friendly. To help prevent that problem, we looked into image prefetching techniques, and fell upon the CSS technique:

body:after {

content:url(image.png);

display:none;

}

This made a world of a difference in testing. According to my research, this technique is suppose to be ignored by browsers that do not support the :after clause and so it felt safe. Since the images in question were very specific to the application being developed and the page in question is not intended to be read by search engines, we decided to inline this code on the page.

Could this be the cause? Well, in trying to narrow it down further, I did comment out the entire style-tag and sure enough, my page did not go into compatibility mode.

This fixed my compatibility-mode problem and hopefully it may help someone else out there.

Math.abs has an edge case...

Consider the following Java code:

int ai = Math.abs( i );

assert ai >= 0;

Will the assert ever fire?

Yes, it turns out that it can fire if i = Integer.MIN_VALUE, and you have been warned about this in the Math.abs javadoc, which states “Note that if the argument is equal to the value of Integer.MIN_VALUE, the most negative representable int value, the result is that same value, which is negative.’