HashSet in Java

HashSet extends AbstractSet and is an implementation of Set interface.
HashSet also implements Serializable and Cloneable interfaces.
HashSet is backed by hash table(actually HashMap instance), i.e., the HashSet uses hash table(HashMap) to store collection elements.
Like HashMap, it allows null only one element.Elements order in HashSet

HashSet makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.

Performance

This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets.

Like HashMap, two parameters that affect the performance of the HashSet instance: capacity and load factor.

The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.

The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed, i.e., internal data structures are rebuilt, so that the hash table has approximately twice the number of buckets.

Iteration over this set requires time propertional to the sum of HashSet’s instances size(number of elements) plus the capacity (the number of buckets) of the HashMap. Therefore, it is highly recommended not to set the initial capacity too high (or load factor too low) if iteration performance is important.

As a general thumb of rule, the default load factor (.75) offers a good tradeoff between time and space costs.

Accessing in multi-threaded environment

Note that the HashSet implementation is not synchronized. So multiple threads access a set concurrently, and at least one of the threads modifies the set structurally, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be “wrapped” using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:

Set s = Collections.synchronizedSet(new HashSet(...));

HashSet is Fail-fast

If the set is structurally modified at any time after the iterator is created, in any way except through the iterator’s own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

The fail-fast behavior of an iterator cannot be guaranteed and iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

Internal working of HashSet

When we look into the HashSet.java class’s source code, we find something similar to below code

It is clear from the above source code that the set achieves uniqueness through HashMap. We know that each element in HashMap is unique. So when an instance of HashSet is created, it basically creates an instance of HashMap. When an element is added to the HashSet, it is actually added to the HashMap as a key using add(E e) method. Now a value need to be associated with key, so a dummy value PRESET (private static final Object PRESENT = new Object();) is associated with the every key in HashMap.

Now look at the add(E e) method

public boolean add(E e) {
return map.put(e, PRESENT)==null;
}

So here there will be possibly two cases

map.put(e,PRESENT) will return null, if element is not present in the map. So map.put(e, PRESENT) == null will return true, hence add method will return true and element will be added in HashSet.

map.put(e,PRESENT) will return old value, if the element is already present in the map. So map.put(e, PRESENT) == null will return false, hence add method will return false and element will not be added in HashSet.

So what happened when I passes duplicate element(set.add(e)) to the HashSet. The add(e) method in HashSet returns false when the element exists in the HashSet, otherwise it returns true. Therefore it did not added the duplicate element to the HashSet.

Adding Custom objects to HashSet

It is really important to override equals() and hashCode() for any object you are going to store in HashSet. Because the object is used as key in map, must override those method.