Gap in Java

Collections in Java accept only reference types as its element, not primitive datatypes. When trying to do so it produces a compile time error. In java when we want to store primitive data types in collections we need to use wrapper classes.

All collection classes of java store memory location of the objects they collect. The primitive values do not fit in to the same definition.

To circumvent this problem, JDK5 and onwards have autoboxing – wherein the primitives are converted to appropriate objects and back when they are added or read from the collections.

Using java.util.HashMap, it is not possible to use Java language arrays as keys. For example, this code:

Trove comes to Rescue

The Trove maps/sets use open addressing instead of the chaining approach taken by the JDK hashtables.

What is Open Addressing?

Open addressing, or closed hashing, is a method of collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternate locations in the array (the probe sequence) until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table.

Open Addressing vs. Chaining

Chaining

Open addressing

Collision resolution

Using external data structure

Using hash table itself

Memory waste

Pointer size overhead per entry (storing list heads in the table)

No overhead

Performance dependence on table’s load factor

Directly proportional

Proportional to (loadFactor) / (1 – loadFactor)

Allow to store more items, than hash table size

Yes

No. Moreover, it’s recommended to keep table’s load factor below 0.7

Hash function requirements

Uniform distribution

Uniform distribution, should avoid clustering

Handle removals

Removals are ok

Removals clog the hash table with “DELETED” entries

Implementation

Simple

Correct implementation of open addressing based hash table is quite tricky

The size of the tables used in Trove’s maps/sets is always a prime number, improving the probability of an optimal distribution of entries across the table, and so reducing the likelihood of performance-degrading collisions. Trove sets are not backed by maps, and so using a THashSet does not result in the allocation of an unused “values” array.

In a gnu.trove.THashMap, however, you can implement a TObjectHashingStrategy to enable hashing on arrays: