Try GNU Trove, it has faster hashmaps, supports primitive types and it doesn't need to allocate Entry objects when traversing the map. Also you can adapt their benchmark code to other collections you're testing.

Edit: Updated with even more maps. Looks like IntHashMap, FastIdentityHashMap, and FastObjectHashMap have the fastest gets for the 3 purposes. As mentioned, these are all basically the same code. Anyone see anything wrong with this code?

Your FastIdentityHashMaps look like they're king. Are these maps that you made yourself?

Sure, it was kind of fun. I lifted the ~10 lines that makes the charts from another benchmark. It is surprisingly easy to make charts, thanks Google! Hear that Riven? I expect not to see any more benchmarks without proper charts!

Of course, these results should be taken with a grain of salt. Memory usage isn't considered, maps are tested with only one data set, not all maps use the same double hashing that HashMap uses, iteration isn't tested, getting keys not in the map isn't tested, etc.

Actually I resurrected the IntHashMap code from an old project... I really ought to find the original code and give proper recognition! I am pretty sure it was originally lifted from here...http://www.koders.com/java/fid128CB47B2558DD20EA15852E444D7928D1E698DD.aspxI will add the appropriate credits to source I distribute. It has been tweaked a little since then. Also, for FastIdentityHashMap I just made the keys Objects and called hashCode(), and for FastObjectHashMap I just replaced == with equals().

I added HashMapV2 from a recent mailing list discussion (linked in first post). It does amazingly well on Android for puts and holds its own for gets. It is probably more complete/robust than FastObjectHashMap (it is certainly fancier). Too bad it is GPL.

First thanks for putting this together. I will definitely be looking at the other types of maps as a result of this. But could you bring back the ordering? It's now hard to compare implementations.

Using your code I made a few changes and ran my own benchmarks. This was only between the IdentifyMap, HashMap, CachingHashMap and FastMap (tbh I couldn't be bothered to go get them all).

I upped the number of iterations to run per test from 3 to 60

It stores all times and then takes the average at the end (so you have an average time over 60 runs)

The gc call was removed from inside test iterations (I thought this was unfair because it gives maps some breathing space which they don't get in real life)

I added the gc call to before the start of each test, to encourage the garbage collector to run and so not pass garbage on to the next test

The main goal was to try to benchmark longer-term performance and including the cost of garbage collection on top. At the very least this is more relevant for me (as I've personally found GC to sometimes become an issue). Here are the results:When I scanned through the values as they were coming out I noticed that the HashMap would have an iteration that ran 5 to 10 times slower then the others. None of the other maps had these outlier values, and I'm presuming it must be the GC kicking in. This is what slows it down. It's also interesting that mine is slower then the HashMap for get, but faster for put and when those times are combined. Which type of map people should use really depends on the situation.

To run the benchmark I also added the -Xmx512m command line so it ran with enough memory, but I found that if I also added Xms512m it improved the performance of the HashMap:Presumably there is no (or very little) garbage collecting going on as a result.

I'll post a zip of my code later today so you can more easily run all the tests.

GC is unpredictable, so I think it is better to remove it entirely rather than have it affect some of the tests sometimes. We can add memory usage pretty easily. I would like to find some way to reliably measure GC activity.

Interesting that bumping the JVM's memory gave you better performance. I'll give that a shot.

Re: identity maps, sometimes they are useful (eg, autoboxed primitive keys or when Class is your key), and they are fast. I really only care about the 3 scenarios: int keys, object keys, and identity keys. Memory usage and large data sets I'm not too worried about, mostly I just care about get. Oh, and I'd like the classes to be small (eg, not a half meg library -- I'm looking at you, Javolution).

I added a class called CachingFastMap. It is the same as FastMap (formerly known as FastObjectHashMap) except it keeps around its entry objects when entries are removed or the map is cleared. Like JL235's CachingHashMap class, this is to avoid garbage collection if you are constantly putting and clearing a map. It scores very well for puts on Android, where memory allocation (and GC!) is expensive.

In the "fast.maps" package you'll find the classes listed as "Nate" in the charts. All these are based on a modified version of IntHashMap from an Apache project. They are fast, minimal, and easy to customize. Eg, if you needed a ShortHashMap, a CachingLongHashMap, a different hashing algorithm, or whatever, you could modify one of these classes and have the map you need in a couple minutes.

I fixed up my classes quite a bit. You can get the source for it all here (3.1mb):http://n4te.com/temp/fast.zipNote fastutil is not included because the JAR is 13mb+! That makes Trove's 2mb look like a good bargain!

The desktop tests are run with -Xms1024m. I added Hashtable. Also, I put the sorting back until someone complains again. With so many implementations, it is hard to see what is faster when they are alphabetical.

Thanks for making the CachingFastMap. I modified your source to make a CachingFastIdentityMap by copying the code from FastIdentityMap, and it seems to have made the code slower - after profiling I find that a large amount of CPU time is used calling:

1

java.lang.System.identityHashCode

What's the reason for calling this? Is it ok to replace it with Object.hashcode() which appears to be faster?

Here's my source for CachingFastIdentityMap without the calls to System.identityHashCode, replaced by Object.hashcode().

/* * Copyright 2002-2004 The Apache Software Foundation. * * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the * License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" * BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language * governing permissions and limitations under the License. */

I'm surprised System.identityHashCode is slower. Using System.identityHashCode means that if you are using objects that have overridden hashCode and equals, even if two objects are .equal() they will (probably) be put in a different hash buckets. Using Object.hashCode may cause more duplicate hashes, but in practice probably makes little difference. If your objects don't override hashCode then there is no difference.

If you have a huge number of objects in your map, you should probably run the benchmarks with a higher number of objects (uses 10,000 currently). Java's IdentityHashMap is pretty impressive. It puts the keys and values alternating in a single array (supposedly good for large data sets) and doesn't need to create or reuse entry objects.

I recently implemented a cuckoo hashMap for my work (3 hash functions for load factors ~.75). Its was 5x faster or more than the standard hashMap over all and much faster for contains and remove, and about 2-3x faster than using linear probing (IdentitiyHashMap uses this) and this is without tuning. I am surprised that none of these implementations use cuckoo probing as far as I can tell. You don't even need to have Entry objects. It seems to offer big gains for primitives as well.

I have no special talents. I am only passionately curious.--Albert Einstein

Yes. get and put both take int as the type of the key, but containsKey takes long and casts it to int. There may be a good reason for doing this, but I can't see it.

Ah. I should update the source I linked to. The FastIntMap class in the zip I posted is the latest and has this fixed. The original class was LongToIntHashMap and somehow a long got left in. I wasn't using the containsKey method so didn't catch it sooner.

delt0r, cuckoo hashing looks very interesting! It looks like it is better for my needs (fast for small tables, which ironically isn't what the benchmarks are testing) and is easy to implement. Can you post your implementation, delt0r? I gave it a half assed try (not much time, have a big project to finish this weekend). I've updated the benchmarks (desktop only) in the first post and here is my class:

/* * Copyright 2002-2004 The Apache Software Foundation. * * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the * License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" * BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language * governing permissions and limitations under the License. */

When an object is put in with the first hash, this code doesn't put it there and move the existing entry, it just tries to put it in the alternate hash slot and rehashes if entries are already in both slots. This causes it to rehash a lot but it was all I have time to do ATM. Also, I don't know my second hash is the best idea (it is Riven's random number algorithm, I now solve all my problems with it! ). The get time is still extremely interesting! However it seems to vary quite a bit each time I run the test. I guess sometimes more lookups take the worst case scenario route (which would happen less often for smaller maps, the tests currently use 10,000 entries). Could also be due to my shoddy implementation.

My implementation does not conform to the java.util.map spec. In particular getValues keySet and entrySet are *not* backed by the hash map.

2ndly, I have not yet done a primitives version yet.

3rdly I use simple hash functions and rely on hashCode to be good. With the hash table itself being a power of 2 for simple masking rather than % which is painfully slow. For example the 3 hash values i use is given by the following code.

Finally I don't use entry objects. I use a Object[] table where even entrys are keys and odd entrys are values. I am doing a *lot* of put/remove and this is where this table is much faster than others I have tried. However I have not tried all in the above list. The idea is that if the hashTable is small then everything will be in cache and it would be very fast. However it turned out it was very fast even with large sizes (millions or more) and modest load factors 0.75. In particular it has better performance with pathological cases than anything i tested. Basically it has no pathological cases.

I doubt this would beat everything on the list, and I want to add a better rehashing performance (use incremental hashing or whatever its called).

If you still want the code I can post it. BUT be warned its somewhat untested, quite uncommented and reasonably untuned/optimised (no profiling yet).

I have no special talents. I am only passionately curious.--Albert Einstein

Thanks for the additional information delt0r. I would love to see what you have, even if it isn't perfect yet.

gouessej, because I write software to pay the bills in addition to doing it as a hobby. Using a GPL'ed library would mean open sourcing my whole app, which is not something I want to do for my commercial software.

This is a great discussion, so many good techniques. Thanks Nate for doing such an awesome job of benchmarking all this stuff.

In the map I'm using for the A* algorithm I make the key and object the same object so I should probably make a map where there's only one array of keys and no entry objects (unless I'm missing something obvious ). I'll give it a go and post the code once I get it working

These class do not conform to the general contract for Map. In particular values(), keySet() and entrySet() return copys of the hash map at that point in time. Removing elements from these returned Collections will have no effect on the map.

We do not "rehash" values returned from Object.hashCode() has the Collection class do. We require that Objects stored have taken proper measures to ensure than they provide sufficient randomness.

Please treat this code as a work in progress. It is fast for remove and contains and similar to other implementations for put. But as i said, I have never even ran a profiler on it. And probably won't, at least not on its own. ie i bench mark it in the app as its used in the app.

Cuckoo hashing however has expected worst case O(1) time for retrieval, contains, delete and even add. This is in contrast to chaining and linear probing that provide O(n) worst case performance.

Note that using a load factor of larger than 0.75 will generally result in resized before the limit is reached. Hence the load factor will in practice never be much higher than 0.75 . In fact i think you can prove that it would be between 0.75 and 0.86 or something with 3 hash functions.

I have no special talents. I am only passionately curious.--Albert Einstein

CommanderKeith, you might post your code and maybe we can come up with something that specifically fits it.

Thanks! my project isn't really bottlenecked by map gets or puts, but it's maxing out the garbage collector with too many HashMap.Entry's so I tried the Caching style maps you and JL345 made and they were perfect.

But delt0r's cuckoo hashing idea was too cool too refuse since:* it doesn't use entry objects so there's no garbage at all or any need for object pooling, * the key is the object so there's no double-storage of references.

The only disadvantage for me I guess is that the hash function is never stored so contains(obj) [or get(obj)] might be a tad slower since the hash of the object being put'ed has to be compared to at least 1 other object's hash code, so at least 2 hashcodes have to be computed.This is the main modification which speeds up the map in my case:

/* * To change this template, choose Tools | Templates * and open the template in the editor. */

packageastarpathfinder.core;

importjava.lang.reflect.Array;importjava.util.*;

//import at.mabs.util.IdentityHashSetLinearProbe;

/** * Checking to see how fast a 3 way cuckoo hashing is compared to linear probing. * * The deletes should be *much* faster. * * Performance is as good or much better than Linear probing except for pathological cases where its about * the same (5% faster). It was never slower. Over all the performance is 2-5x faster than java.util.hashSet * and sometimes much more (for crazy cases that probably never happen) * * @author greg ewing * * I put this into the public domain * * @param <T> */publicclassCIdentityHashSet<T> implementsSet<T> {privateT[] table;privateintsize;privatedoubleloadFactor;privateintmask =0;privateintthreshold;privatestaticintPRIME =2053368389;// random 31 bit prime cus primes are cool

There are a few no brainiers to speed up my implementation. Lazy calculation of the hashes is one. Better hash cals is another. Get rid of the multiply for starters. There are other things too.

However you will probably be disappointed with the performance increase as it would probably close to undetectable. However what I do for some of my sets is to have a special object type with 3 final int fields. Since i only need identity, these are filled with random ints at creation time, these are then my 3 hash values and I don't need to calculate anything.

It does increase performance. But only a little. The best part of cuckoo hashing is worst case O(1) performance.

But good implementations is nothing compared to just being smarter with how you use them in the first place.

I have no special talents. I am only passionately curious.--Albert Einstein

However what I do for some of my sets is to have a special object type with 3 final int fields. Since i only need identity, these are filled with random ints at creation time, these are then my 3 hash values and I don't need to calculate anything.

That's pretty clever. Thanks for the code, it's smoothed out my A* path-finder so there's no GC pauses and it's super fast which is fantastic

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org