Introduction

The left subtree of a node contains only nodes with keys less than the node's key.

The right subtree of a node contains only nodes with keys greater than the node's key.

Both the left and right subtrees must also be binary search trees.

From the above properties, it naturally follows that:

Each node (item in the tree) has a distinct key:

I have always been fascinated with the Binary Tree data structure. It’s nice to have a storage structure that supports O (log n) search time. I've decided to create my own binary tree data structure with a few added features like the ability to rebalance the tree whenever the user desires.

Background

First let me explain what O (log n) means when it comes to the Binary Search Trees.

Let's start out with O (n):

Let’s use an array for this example. Let’s suppose I have an array of 10 items. If I wanted to search for a specific element, there is a chance that I may have to search all the elements in the array. This situation will occur if the element that I’m looking for is located in the last cell of the array. Now let n be equal to the length of the array (here n=10). That means I will have to search up to 10 items. Therefore the search time for an array is O (n), where n is the length of the array.

Now O (Log n):

Now let’s say that I use a binary tree to store the 10 items instead of an array. Because the binary tree will store the elements in sorted order if I’m looking for a specific item, I don't have to search the entire binary tree. I can just check if the current element is greater or less than the element that I’m looking for. If the current element is greater than the element I’m looking for, I search its left tree. If the current element is less than the element I’m looking for, I search its right tree. The number of comparisons I make is really the depth of the tree itself. So O (log n) is the actual depth of the tree (the number of nodes from the root to the deepest element). This is the number of comparisons I need to make to find an element in the average case.

Why is O (log n) better than O (n)?

Again let’s say that we have 8 items that we place into an array and a binary tree.

In the array, my search time will be O (8) = 8. In the binary tree, my search time will be O (log 8) = 3. We can see that 3 < 8 here. So the binary tree would be a better choice.

Even as the number of elements increase to 64 (n = 64), I can see that O (log n) is faster because O (log 64) = 6 whereas O (64) is 64. It’s important to remember that in the binary tree, I’m only searching up the depth of the tree.

There can also be situations where a binary tree becomes degenerate (Google it). In this case, the average search time for both the array and binary tree will be O (n). My Balance() function will take care of the degenerate tree and restore a O (log n) search time.

Implementation

I've created some simple procedures to visit the nodes in the binary tree in different orders. These functions are InOrder(), PreOrder(), PostOrder(), DepthFirst(), BreadthFirst(). These terms should be familiar to knowledgeable developers. They all used named iterators to return each node. Currently I just print out the value of the node to the console. You can modify it to your liking.

The Balance() Function Explanation

As items get added to the binary tree, the structure can degenerate and cause a O (n) search time in some cases. The Balance() function will rebalance the tree. The method I used to rebalance the tree is common. I've actually seen a few articles on the internet explaining it. I first copy the reference pointers of the elements in the binary tree into a list. I actually copy them InOrder. This way all items in the list are in sorted order from smallest to largest. I then find a pivot index like this:

int middleNode = (int)Math.Ceiling(((double)min + max) / 2);

From the above line, I get the middle element of the list. This will be my root node in the new balanced tree. I then recursively apply the same technique to the left half of the list and then to the right half. Every time I find the ‘root’ node, I add it to my new balanced tree. At the end of the recursion, I am left with a nicely balanced tree.

Comments

The way I implemented the Delete() method when it comes to deleting the root node is interesting. Not only do I remove the root but I also rebalance the tree. I was looking at some examples where when I remove the root and not rebalance the tree, we would not get a good binary distribution and optimal tree. That being said, the binary tree (not the elements) is re-created in this case and does cause a small performance hit to re-insert them into the binary tree. The Delete() method can also be called to delete the root node and not rebalance the tree. There is virtually no performance hit here other than the O (log n) to find the left-most node on the right sub-tree.

Very Important Points

Remember that this binary tree uses recursion to insert items and retrieve items (InOrder, PreOrder, PostOrder). That means that if there are many levels in the tree and therefore many nodes your execution stack may become overwhelmed. The thread execution stack has a limit of roughly 1 MB so many recursive calls will result in a StackOverflowException which cannot be caught and handled. In cases where you plan to store a huge amount of items in the tree please convert the recursive method to iterative ones. For example I converted the recursive insert to an interative version below:

Comments and Discussions

Thanks for the reply. You should not vote an article of 2 just because I left out one method and plan to finish it soon. I'm not building a production API here just a demonstration of how to create data structures from scratch. The concept is what is really important. There are many gems in my code that reveal how to do many things. For example the Order methods and the Balance methods. I wanted to share this code so that developers like yourself can understand data structures better. My intentions are good.

Good for you. You should not be afraid to publish something prior to its being perfect. What counts is that the article is something one can learn from. No one, myself included publishes articles with the source code preped as it should be for production, with full unit tests and exception handling.
Keep up the good work.
Ken