Tuesday, August 7, 2012

I assume readers have basic knowledge about data general purpose data structures and Java Collections framework to under stand this article. Java Collections framework was designed and implemented using general purpose data structures to cater to different kinds of users. We do not need to write our own hash table or linked list to handle huge data items.Basically, Java collections classes (Set/Map) falls into three core data structures such as Hash Table, Red black Tree and Linked List.

Why Hash Table ?. - It is based on arrays - Uses hashing technique which offers very fast insertion and searching. - If we don’t need to visit items in order, and we can predict in advance the size of our data, hash tables are unparalleled in speed and convenience.

Why Red Black Tree ? - It is a balanced tree, guarantees O(logN) performance whether the
input data is ordered or not. - Advantage in a red-black tree is that sorted
data doesn’t lead to slow O(N) performance.

Why LinkedList ? - Used whenever the amount of data to be stored cannot be predicted
in advance or when data will frequently be inserted and deleted.

HashMap/HashSet/LinkedHashMap/LinkedHashSet - all of them have constant time performance for retrieval,insertion and deletion operations. But LinkedHashMap/LinkedHashSet performance is bit lesser than HashTable since it needs to maintain additional linked list.

Except red-black tree (which is non-linear), all the data structures used here are linear type.

Operations

HashTable

Red black Tree

Linked List + HashTable

Retrieval (get)

O(1) - HashMap/HashSet

O(log N) -TreeMap/TreeSet

O(1) - LinkedHashMap.

Insertion (put)

O(1) - HashMap/HashSet

O(log N) - TreeMap/TreeSet

O(1) - LinkedHashMap

Deletion

O(1) - HashMap/HashSet

O(log N) - TreeMap/TreeSet

O(1) - LinkedHashMap

Parameters affecting performance

Initial Capacity and Load Factor

Initial Capacity and Load Factor

Iteration Order

Does not guarantee the iteration order

Sorted according to the natural order

This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order). Insertion order is not affected if a key is re-inserted into the map.

O(1) - constant time : Better performance.

O(n) - linear time : Worst performance for the large data.

O(log N) - logarithmic time : Average Performance.

2) Comparison ArrayList and LinkedList

Array list allows random access , so we can grab any element in constant time whereas linked list allows sequential access of elements. You can walk front/back but grabbing element in the middle is proportional to the size of the list.

ArrayList is useful when

- we need random access of elements.

- we don’t need to remove elements inbetween.

- adding elements at amortized time

LinkedList is useful when

- we prefer iteration over the the random access

- we need efficient removal in between the elements or at start.

Operations

ArrayList

LinkedList

Base Data Structure

Dynamically resizing array

Doubly linked list

Retrieval (get)

O(1)

O(n)

Insertion (add)

O(1) - on average

O(n) - on worst case

O(1)

Deletion

O(n)

O(1) - Iterator.remove

Mostly useful for

Efficient Retrieval - (by random access)

Efficient Removal

Is NULL permitted

Yes

Yes

Iterators

Iterator and ListIterator which are fail-fast*

Iterator and ListIterator which are fail fast*

* Fail-fast : If the list is structurally modified at any time after the iterator is created, in any way except through the Iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException.