You are here:

Maybe I'm being misled by my profiler (Netbeans), but I'm seeing some odd behavior, hoping maybe someone here can help me understand it.
I am working on an application, which makes heavy use of rather large hash tables (keys are longs, values are objects). The performance with the built in java hash table (HashMap specifically) was very poor, and after trying some alternatives -- Trove, Fastutils, Colt, Carrot -- I started working on my own.
The code is very basic using a double hashing strategy. This works fine and good and shows the best performance of all the other options I've tried thus far.
The catch is, that lookups into the hash table are the single most expensive method in the entire application -- despite the fact that other methods are called many more times, and/or do a lot more logic.
What really confuses me is the lookups are called only by one class; the calling method does the lookup and processes the results. Both are called nearly the same number of times, and the method that calls the lookup has a lot of logic in it to handle the result of the lookup, but is about 100x faster.
Below is the code for the hash lookup. It's basically just two lookups into an array (the functions that compute the hash codes, from profiling, are virtually free). I don't understand how this bit of code can be so slow, and I don't see any way of making it faster.
Note that the code simply returns the bucket matching the key, the caller is expected to process the bucket. 'size' is the hash.length/2, hash1 does lookups in the first half of the hash table, hash2 does lookups in the second half. key_index is a final int field on the hash table passed into the constructor, and the values array on the Entry objects is a small array of longs usually of length 10 or less.
Any thoughts people have on this are much appreciated.
Thanks.
public final Entry get(final long theKey) {
Entry aEntry = hash[hash1(theKey, size)];

Reply viewing options

Its just a guess, but I don't think array access is the problem here.
Profilers tend to be quite inaccurate for code which runs fast, like the method you showed.
Would be great to hear about your findinding!

I do think there is a lot of noise in there from the profiler, the method should be quite fast...really should only take a couple of ns, and the profiling instrumentation is probably greater than that.
Though, with that said, eliminating just a single array lookup from that function (changing it to a member variable access) did get a non-trivial speedup. I do think there is overhead from the array bounds checks forced by the JVM, and when the function is a simple as that one is, and is executed as often, they really start to add up.
I'm told there is a JVM option for disabling these checks, but I don't know anyone who has actually gotten it to work. Maybe its a Java 7 thing. We might try just tweaking the native code in the JVM that actually handles the array accesses just to see what kind of affect it really does have.

The fastdebug-builds (eg. download from jdk7.dev.java.net) are able to print the generated assembler.Maybe this can give some more insights.
If you're really interested whats going on, both AMD and Intel offer trial versions of their tuning products, Intel has something called VTune. It is able to exactly show where how much time is spent and why (cache misses, branch mispredicts, pipline stalls, ...)
Good luck, and please keep us up2date =)
lg Clemens
PS: Wow I hate that new forum :/

Though, with that said, eliminating just a single array lookup from that function (changing it to a member variable access) did get a non-trivial speedup. I do think there is overhead from the array bounds checks forced by the JVM, and when the function is a simple as that one is, and is executed as often, they really start to add up.

Sure, the bounds-check usually a few compares/conditional-jumps to the code, however they are quite predictable, so shouldn't consume a lot cycles.

I wonder/guess if that could have something to do with cache misses. I guess your hashtable is a lot larger than the size of your cache? How does it perform if you only add a tiny little bit of data?

mhgrove wrote:

I'm told there is a JVM option for disabling these checks, but I don't know anyone who has actually gotten it to work. Maybe its a Java 7 thing. We might try just tweaking the native code in the JVM that actually handles the array accesses just to see what kind of affect it really does have.

At least in the Oracle/OpenJDK's JVM there isn't such a switch, because it would violate the JVM spec.