Using an identity hashmap for wrapper classes together with auto boxing is a guarantee for failure. As long as your keys are within the cached range it looks like it will work. But once you use keys outside the cached range (mostly 0..255) you get different objects for the same integer.

So if you use identity hashcode you must use ==. Then you can't use auto boxing.

The problem with code that's not a black box, is that someone will use it as a black box. Especially if it implements java.util.Map. I generally don't extend from that when i want special behavior. Note i do randomize my tests as well.

This code is intended to work as a drop in replacement for HashMap or IdenetityHashMap. As such you must make consistent use of Object.equals(), Object.hashCode(), System.idenetityHashCode() and ==. There is no way to have correct code otherwise.

Cuckoo hashing does have a limitation that means its not a true drop in for any chaining HashMap. In particular you cannot add 3 different objects that return the same hash code. It will rehash till you run out of ram.

I have no special talents. I am only passionately curious.--Albert Einstein

I'm just throwing out general comments about various possible implementations. Take them for what they're worth. Personally I have no problem with creating codebases with strict contracts that if violated will be "boom".

So the problem is that if 4 or more objects are added and they have the same hashCode but are not == then the cuckoo map will rehash forever?

How about a system where the map uses hashCode() until the above is encountered and then as a fall back uses system.identityHashCode() only for those problem objects. Because identityHashCode really kills performance massively. Frame rate drops and takes 13% of cpu in my case when using -xprof vm option.

No. The problem is that == and equals means different things. Just as Object.hashCode() and identityHashCode() do.

Line 7 will return false perhaps once every 2^32 iterations. Generally you want Object.equals(). But for the odd occasion you need "identity" for example using reflection for a deep clone, or with serialization.

Falling back from one method or the other is even worse. Its makes the semantics (meaning of hashMap) random.

The infinite allocation problem is *not* solved by using a different hash. I have a single number that must deterministically produce 3 pesudo random numbers. If the original number is the same then all three hash functions will always produce the same 3 values. This is compounded by the fact that java Number class wrappers use stupid hash values. They should at least contain "type" in the hash as well (ie Short.hashCode() should not give the same result as Long.hashCode() for the same value).

Nate said:

Quote

To avoid the infinite allocation, a stash could be used. This has a small performance hit for gets. I have an implementation I'll post in the "map performance" thread (soon) so we don't hijack d3ltor's thread.

If you control the objects, you could implement hashCode and cache the identityHashCode. But, why is your application so sensitive to map performance!?

Yes this would. While giving worst case performance of O(n). In which case perhaps chaining or linear probing is the better choice.

btw you are not hijacking anything.

I have no special talents. I am only passionately curious.--Albert Einstein

Yes this would. While giving worst case performance of O(n). In which case perhaps chaining or linear probing is the better choice.

Did you mean this is response to using a stash? The required stash size is only log(n) (see the paper linked below), however even then it is empty almost all the time, has only 1 to ~3 items when it isn't empty, and is only checked when the key was not found first in the 3 hashes. I'm pretty convinced a stash is the way to go. It has little overhead and handles pathological cases, making cuckoo robust.

Okie, I'll post my cuckoo stuff here then. This can be the cuckoo thread and that other thread can be the chart thread. Actually, the chart thread is long, messy, has a lot of OT, and is in the Android section... maybe I'll make a new fresh one.

I guess there's no fixing the hash functions, n-way collisions will always be an issue. I tried 4 hashes and it happened less often, but still bothers me. Stash to the rescue!

I have implemented cuckoo maps (3 hash, random walk) with and without a stash. Implementing the stash was very little code. Benchmarks:

These benchmarks reuse the map instance so put can be more accurately measured, which means the benchmarks don't reflect the time it takes to rehash (and of course memory usage is not shown). I've run a few standalone tests with millions and millions of puts, and the stash prevents the map from ever rehashing due to 3-way hash collisions or loops. The required stash size is only log(n), according to this paper.

To answer my earlier question, I found that max(16, sqrt(n) / 4) for the max number of push iterations for a single put is a reasonable value to minimize the number of iterations while also minimizing the stash usage. This seems to scale well. Without the stash, it seems max(32, sqrt(n) / 2) is a good value to not cause a rehash most of the time.

I think I'll go with the stash version as it is more robust without much penalty. Next I'll implement the rest of the map methods (delete, etc) and an object key version, and then I'll run new benchmarks with all the maps.

I am suspicious of claims of a O(ln(n)) stash size, my guess that the proof requires strong assumptions about the hash functions and all the hash codes begin "truly" random, in this case chaining and linear probing is also O(ln(n)). At any rate that is now a treeMap, not a hash map. All my tests are based on 1Million entries or more.

For tables of size 16 or so, I doubt hashMaps would be quicker than just iterating.

I have no special talents. I am only passionately curious.--Albert Einstein

I am suspicious of claims of a O(ln(n)) stash size, my guess that the proof requires strong assumptions about the hash functions and all the hash codes begin "truly" random, in this case chaining and linear probing is also O(ln(n)).

Well, the paper is there. There is some stuff about results from Braverman that proves "polylog(n)-wise independent hash functions are sufficient for" cuckoo hash maps using a stash/queue. I'm not really sure what that means though.

Possibly better, play with my Cuckoo3Stash class. I'm using a log(n) stash size and I am not seeing it exceeded in my tests. The stash is often not even used, and when it is only has a few items in it. Here is my class for ease of browsing or copy pasting into an IDE (yay JGO syntax highlighting!):

/** * Returns an iterator for the entries in the map. Remove is supported. Note that the same iterator instance is reused each * time this method is called. */publicEntries<V> entries () {if (entries == null)entries = newEntries(this);elseentries.reset();returnentries; }

/** * Returns an iterator for the values in the map. Remove is supported. Note that the same iterator instance is reused each time * this method is called. */publicValues<V> values () {if (values == null)values = newValues(this);elsevalues.reset();returnvalues; }

/** * Returns an iterator for the keys in the map. Remove is supported. Note that the same iterator instance is reused each time * this method is called. */publicKeyskeys () {if (keys == null)keys = newKeys(this);elsekeys.reset();returnkeys; }

At any rate that is now a treeMap, not a hash map. All my tests are based on 1Million entries or more.

For what definition of "treemap"? Eg, Java's TreeMap is a red-black tree, which my Cuckoo3Stash certainly isn't. AFAIK, it is still a cuckoo hash map. There are lots of papers about using a stash with cuckoo.

Quote

For tables of size 16 or so, I doubt hashMaps would be quicker than just iterating.

For your iterators, I don't think repeated calling of next() will in fact increment the returned item. ie you would always return the same item. You are not required to call hasNext() in order to get the next item.

I have no special talents. I am only passionately curious.--Albert Einstein

Just read a little bit of that paper. It should be noted that the extra xor makes the hash function i use non linear, ie doesn't suffer from their warning. So it is at least better than a pure linear ax+c mod d. What makes it nonlinear is the combination of * and ^. Just one or the other is equivalent to a Galois Field (ie GF(2^n)).

You could add extra non linear structure. But from my tests I highly doubt that it will make any real difference. The weak point is that Java doesn't use good hash values for its hashCode() method. Shame really.

For my objects I often use longHashCode(). This works even when you have more than 1e9 entries.

I have no special talents. I am only passionately curious.--Albert Einstein

For your iterators, I don't think repeated calling of next() will in fact increment the returned item. ie you would always return the same item. You are not required to call hasNext() in order to get the next item.

My code takes almost twice as long, even when getStash is not actually called! I can replace "return getStash(key);" with "throw new RuntimeException("blah!");" and it doesn't change the speed. If I change "return getStash(key);" to "return null;" then it runs the same speed as delt0r's code. I guess it can't be optimized the same way? Seems crazy.

So the System.identityHashCode() method is called only once and is cached when the object is created. I then replaced all CIdentityHashSet calls to System.identityHashCode(object) with object.hashCode() and this gained some speed.

Btw I use CIdentityHashSet mostly for its fast contains(object) method and noticed that it doesn't do lazy calculation which gains some speed, so here it is for what it's worth:

I know a few posts here got deleted due to the JGO outage but I read most of them I think. Nate raised the point that my program shouldn't depend so much on map/set performance. For some reason it does, A* path finding does lots of contains() checks in the open and closed lists which I use the CIdentityHashSet for.

My code takes almost twice as long, even when getStash is not actually called! I can replace "return getStash(key);" with "throw new RuntimeException("blah!");" and it doesn't change the speed. If I change "return getStash(key);" to "return null;" then it runs the same speed as delt0r's code. I guess it can't be optimized the same way? Seems crazy.

Maybe try java 7 and see if it makes any difference since it looks like a weird hotspot quirk that might be flattened out with of java 7's more aggressive optimisations.

Nate raised the point that my program shouldn't depend so much on map/set performance. For some reason it does, A* path finding does lots of contains() checks in the open and closed lists which I use the CIdentityHashSet for.

I recently was using A* for a small map and used an int[] for the closed list and a Node[] and binary heap for the open list. To keep from allocating or clearing the int[] each time, I increment an int "id" and check "closed[y * width + x] != id". Each node also has an ID, and a similar check tells if the node is being encountered for the first time this run.

Quote

Maybe try java 7 and see if it makes any difference since it looks like a weird hotspot quirk that might be flattened out with of java 7's more aggressive optimisations.

I recently was using A* for a small map and used an int[] for the closed list and a Node[] and binary heap for the open list. To keep from allocating or clearing the int[] each time, I increment an int "id" and check "closed[y * width + x] != id". Each node also has an ID, and a similar check tells if the node is being encountered for the first time this run.

Ah wow, that is a good idea! No need for sets/maps in the A* algorithm by letting the node track its own open/closed/unprocessed status. I tried it out and it sped up the algorithm lots. Clever thinking. Thanks Nate

I'm still using the maps/sets for other things so I'll keep track of your great work guys. Cheers,keith

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org