RUN-TIMES FOR SOME TREE ALGORITHMS

(C) Samuel L. Marateck, 1999

The run-time for a tree procedure depends on the number of nodes
visited. In a procedure like Inorder that traverses the entire tree,
the procedure visits each node a constant number of times, so the
run-time is O(n) for a tree with n nodes.

Run-times for procedures that operate on already constructed binary
search trees are in general less than O(n) because only one path is
followed at each node. For instance, in the following tree, it takes
four comparisons to locate the value stored in node 9 (the path
followed is described by the nodes 1, 2, 4, and 9). To locate a value
stored in any other node of the tree requires that number of
comparisons or less.

1
/ \
2 3
/ \ / \
4 5 6 7
/ \
8 9

To get the run-time in general, it helps to consider a full tree.
Because each node has two descendants, the number of nodes on each level
form a geometric sequence: 1, 2, 4, 8,... The sum of the number of nodes
from level 0 to level L is given as the sum of the geometric series,
namely, (1) n = 2(L+1) - 1

where it is assumed that the root is at level 0 and its two children are
at level 1, and so forth. Thus, if the tree under consideration were
full, it would have 15 nodes. The number of nodes on a path from the
root to the last generation is L + 1, in our case 4. This is also the
maximum number of comparisons to find a value in a node of the tree.

To find the number of comparisons required as a function of the number
of nodes, solve equation (1) for L + 1. Thus 2(L + 1)
= n + 1, or

(2) L + 1 = log(n + 1)

where the log is taken to the base 2. Since O( log(n
+ 1)) is the same as O(log(n)), the run-time for finding
a value in a full BST is O(log(n)). If the data used
to form the BST is arranged in a random order, even if the
tree is not full the average run-time for finding an element is still
O(log n).

If the original data is sorted, however, the BST
degenerates into a tree that looks like a linear linked list (we'll
call it a linear tree) and the run-time is no longer
O(log(n)).

1
/
2
/
3
/
4

As was shown in class, the run-time required to find a value in the
list is O(n).

The Time Analysis for Constructing a BST

Another interesting calculation is the determination of the run-time
needed to construct a BST. First, we'll assume that the data to be
inserted in the tree are randomly ordered. If there are n nodes already
in the BST, it requires O(log n) steps to place the new node in the
tree. Since there are n nodes to be inserted, the run-time to construct
the tree is O(nlogn).

If the numbers are sorted, resulting in a linear tree, it takes
i comparisons to insert the ith number. The sum of
these comparisons from 1 to n is the sum of the arithmetic
series, namely, ½n(n+1). So the run-time to construct the
tree is O(n² ).

The Time Analysis for Determining the Height of a Tree

What is the run-time for determining the height of a tree? The function
is :

Since the program must visit all the nodes of the tree, the run-time
is O( n )

The Time Analysis for an Algorithm that Determines the Height at
each Node

How do we do the time analysis for an algorithm that calculates the
height of each node? There are two algorithms to do this. The first
visits each node but once and thus its asymptotic behavior is clearly
O(n). The second one, uses the height function just discussed. In this
section, we will analyse the second algorithm. At each node V visited,
the height is calculated. This means visiting each node in V's subtree.

First let's look at a complete tree of 15 nodes and manually count the number
of nodes visited. The first subtree with 1 as its root has 15 nodes.
The two subtrees in the first generation have 7 nodes each.

1
/ \
2 3
/ \ / \
4 5 6 7
/ \ / \ / \ / \
8 11 15

Thus aggregate number of nodes visited is the sum of 15 + 2*7 + 4*3 + 8*1, or
15 + 14 + 12 + 8, or approximately, 4*15. But log(16) is 4, so our sum
is
15log(16). For large N the asymptotic time analysis yields
Nlog(N) .

In effect this is what is happening: for a full tree with many nodes N, in
the first generation there are two subtrees with N/2 nodes each. The sum of
the nodes on this level is N/2 + N/2 = N. In the next generation there are four
subtrees with N/4 nodes each. The sum of the nodes on this generation is again
N. For N very large -- this is called the asymptotic limit-- at each level of
the tree the sum of the nodes in the subtrees is N. Since there are log(N)
levels, the asymptotic behavior is
O(NlogN).