August 26, 2012

Finding LCA of two nodes in a tree

A tree is given say T, we’ve to find out the lowest common ancestor of two nodes. LCAT(u,v) – given nodes u, v in T(tree), return the node furthest from the root that is an ancestor of both u and v. In below figure, node in green is LCA of u and v.

We’ll use RMQ(Range Minimum Query) to solve LCA problem as both problems could be reduced to each other and solving RMQ seems easy. In case you don't know RMQ please go through this link RMQ tutorial on TopCoder

If an algorithm has preprocessing time f(n) and query time g(n),

we will say that the algorithm has complexity {f(n),g(n)}. There’s an lemma which states that:

If there is {f(n),g(n)} solution to RMQ then there exists {O(f(2n-1))+O(n), O(g(n))+O(1)} time solution exists for LCA.

For RMQ we can use Segment Tree. Processing time f(n) for segment tree is O(N) and query time is O(lgN). Hence putting these {O(N),O(lgN)} in above lemma we get {O(N),O(lgN)} time for LCA also.

First of all we’ve to reduce LCA problem into an RMQ problem. Reduction process is as follows:

An Euler tour in an undirected graph is a tour that traverses each edge of the graph exactly once. An undirected graph has a closed Euler tour iff(if and only if) it is connected and each vertex has an even degree. Here we’ve tree and we’ll visit every edge twice. Well, both of the statements seems contradictory but for the understanding, take it as Euler tour of tree.

We’ll build three arrays, Euler tour array E[], Level of nodes in tree L[] and representative array R[] which will hold the first occurrence of node in Euler tour.

E[1,..., 2*N-1] - the nodes visited in an Euler Tour of T. E[i] is the label of i-th visited node in the tour. E[i] is the label of the i-th node visited in the Euler tour of i. The Euler Tour of T is the sequence of nodes we obtain if we write down the label of each node each time it is visited during a DFS(Depth First Search). The array of the Euler tour has length 2*N-1 because we start at the root and subsequently output a node each time we traverse an edge. We traverse each of the N-1 edges twice, once in each direction.

L[1,..., 2*N-1] - the levels of the nodes visited in the Euler Tour. L[i] is the level of node E[i]. Let the level of a node be its distance from the root. Compute the level array , L[1,…,2*N-1] where L[i] is the level of node E[i] in the Euler Tour.

R[1,...N] in which R[i] is representative of node is the index of the first occurrence of node i in E[].

In implementation things we’ve to do are:

1. Run DFS on tree and build E[], L[], R[]

2. Build segment Tree for L[]

3. For two query nodes, u and v, first of all we find values in R[], i.e. R[u] and R[v]. This will give us two index of E[]. Suppose X=R[u] and Y=R[v]. X and Y are indices in E[] which gives us range for RMQ. As L[] keeps the level of nodes, we’ll perform query on L[]. We’ve to find that index for which holds the minimum value between L[X] and L[Y]. The node corresponding to that index in E[] will be our answer.

The picture shows an example and is self-explanatory:

Implementation for this example is given below. There are three functions which are quite basic but still deserves some explanation

dfs(root,level): This is just dfs of tree with some additions. We store nodes in E[] two times, first when we start exploring that node and second time when we returns from it. In the 'traverse' part of dfs we've three exact lines which are outside of 'traverse' because we've to store the node again after the traversal is finished to the node.

segmentTree(int node, int start, int end): This function builds the segment tree. node is the index of root node. start is starting index of L[], i.e. 1 and end is the last index of L[], i.e. 2*N-2

query(int node, int start, int end, int left, int right): node, start, end has the same meaning as in the last function. left is the left node for query, similarly right is the right node for query. These left, right comes from R[] and the range of left to right is queried in L[]