Interface

Often, it can be equipped with a constructor that organizes every object into its own set.

Quick Find

Here is a very simple (but not all that effective) way to achieve what we want: We keep an array that stores the information about which set the objects are in. The interface is implemented as follows:

find(x): Return the value at position x in the array. This is just O(1)

union(x,y): Scan through the array to check if any of the values are y. If so, update them to x. This is O(n)

classUnionFind{//Quick Findint*sets;intN;public:UnionFind(intn){//Set up a union-find data structure with n elementsN=n;sets=newint[N];for(intiii=0;iii<N;iii++)sets[iii]=iii;}intfind(intx){returnsets[x];}voidmerge(intx,inty){//We call this merge here. Apparently, union is a keyword in cppintroot_x=find(x);introot_y=find(y);for(intiii=0;iii<N;iii++)if(sets[iii]==root_x)sets[iii]==root_y;}};

Actually, we can do better than that. Let's see how.

Quick Union

This time, we will still use an array for storage but we'll imagine it to be a forest.

We'll keep an array called parents to track of whose parent which element is. Each sets form a tree represented by its node.

Here is an example of a forest where {1,2,5,6,7} form a set and {0,3,4} forms another.

find(x): Recursively keep finding the parent of x until an element which is the parent of itself is encountered. Because this is a tree, if the unions were random enough this should do better, but the worst case is \(O(N)\), if the tree is very tall.

union(x,y): Find the root of x and make it point towards the root of y

classUnionFind{//Quick Unionint*parent;intN;public:UnionFind(intn){//Set up a union-find data structure with n elementsN=n;parent=newint[N];for(intiii=0;iii<N;iii++)sets[iii]=iii;}intfind(intx){introot=x;while(parent[root]!=root)root=parent[root];returnroot;}voidmerge(intx,inty){introot_x=find(x);introot_y=find(y);parent[root_x]=root_y;}};

Weighting

The problem with the above data structure is that the trees might become too tall. This problem can be fixed by deciding correctly which tree should go under which.

Tree 1

Tree 2

Would it be a better idea to put Tree 1 under Tree 2 or Tree 2 under Tree 1?

Tree 2 has a height of 4 whereas Tree 1 has a height of 3. If we put Tree 2 under the root of Tree 1, we get a larger tree of height 5. However, putting Tree 1 under the root of Tree 2 still makes a tree of height 4.

In general, when we have two trees of height \(m\), \(n\) such that \(m \leq n\). We should put the tree of height \(m\) under \(n\) and still get a tree of height \(n\).

To implement this, we need to keep an array size[i] that keeps track of the objects in trees rooted at i.

classUnionFind{//Quick Union with Weightingint*parent;int*size;intN;public:UnionFind(intn){//Set up a union-find data structure with n elementsN=n;parent=newint[N];size=newint[N];for(intiii=0;iii<N;iii++){parent[iii]=iii;size[iii]=1;}}intfind(intx){introot=x;while(parent[root]!=root)root=parent[root];returnroot;}voidmerge(intx,inty){introot_x=find(x);introot_y=find(y);if(size[root_y]>size[root_x]){//Make sure that the smaller tree goes under the larger treeparent[root_x]=root_y;size[root_y]+=size[root_x];}else{parent[root_y]=root_x;size[root_x]+=size[root_y];}}};

Now, both find and union works in \(O (\lg n)\).

The tree's height increases by at-most one node when another tree of greater or equal height is union-ed with it.

Since the other tree is at-least as large as itself, the resultant tree must have at-least double the number of elements.

But there are only \(n\) elements, so the doubling can happen at most \(\lg n\) times.

Thus, the maximum height of the tree is in \(O (\lg n)\) which is the number of operations we need to approach the root.

Path Compression

Here is another idea: We're already touching all the nodes from x upto the root. Why don't we just as well push them up the tree as we go?

That requires just one line of extra code in the find operation. Check line 22 below.

classUnionFind{//Quick Union with Weighting and Path Compressionint*parent;int*size;intN;public:UnionFind(intn){//Set up a union-find data structure with n elementsN=n;parent=newint[N];size=newint[N];for(intiii=0;iii<N;iii++){parent[iii]=iii;size[iii]=1;}}intfind(intx){introot=x;while(parent[root]!=root){parent[root]=parent[parent[root]];//Push up the node by one levelroot=parent[root];}returnroot;}voidmerge(intx,inty){introot_x=find(x);introot_y=find(y);if(size[root_y]>size[root_x]){//Make sure that the smaller tree goes under the larger treeparent[root_x]=root_y;size[root_y]+=size[root_x];}else{parent[root_y]=root_x;size[root_x]+=size[root_y];}}};

This practically keeps the tree almost flat. In fact, this makes the operations work in \(O (\log ^* n)\) time as proved by Hopcroft and Ullman.

\(\log ^* n\) is the number of times one needs to apply \(\log\) to \(n\) to get a value less than or equal to 1. In practice, one could think of it to be almost \(O(1)\) since it exceeds 5 only after it has reached \(2^{65536}\)