Treaps

June 26, 2009

Dictionaries are a common data type, which we have used in several exercises (Mark V. Shaney, Word Frequencies, Dodgson’s Doublets, Anagrams). Hash tables are often used as the underlying implementation for dictionaries, and in a previous exercise we developed ternary search tries as another implementation for dictionaries, which are useful when the keys are strings. Today’s exercise examines treaps, which are yet another method for implementing dictionaries.

A treap is essentially a binary search tree, represented as either an empty node, called nil, or a node with a key and a value and pointers to two child nodes, the left child and the right child. Binary search trees obey the rule that all nodes in the left child have keys that are less than the key of the current node, and all nodes in the right child have keys that are greater than the key of the current node. One node is distinguished as the root of the tree; it is the only node visible outside the tree. Obviously, a single set of keys could be represented by many different binary search trees, depending on which key is chosen as the root and the choices made at each level under the root.

The speed of a search depends on the depth at which the key is located in the tree. Trees that have more balance (fewer nodes with nil children) are faster to search than trees that have less balance (more nodes with nil children) because the overall depth of the tree is less. The two extreme cases occur when a tree is perfectly balanced (no internal nil nodes), when a search takes time logarithmic in the size of the tree, and when a tree is perfectly imbalanced (a long chain of nodes each with one nil child), when a search takes time linear in the size of the tree.

Treaps use an ingenious method to assure that the tree is approximately balanced at all times. Each node, in addition to the key, value, and two child nodes, contains a numeric priority, which is randomly assigned at the time the node is created and never changed. Of the many possible trees that could represent a particular set of keys, the one that is chosen obeys the heap-order property, with the priority of each node being greater than the priorities of all of its child nodes. Assuming all keys and priorities are distinct, there will be exactly one tree that satisfies both the tree-order and heap-order properties; it is the same tree that would be created if keys were dynmically inserted into a normal binary search tree in order of the priority field. The genius of this method is that the priorities are independent of the keys, making it extremely unlikely that highly-imbalanced trees could be created, thus keeping search times approximately logarithmic.

As nodes are dynamically inserted into and deleted from the tree, it is sometimes necessary to restructure the tree to maintain both the tree-order and the heap-order properties. The restructurings, known as rotations, come in two varieties, left and right:

[ This image, by Ramasamy, is used under the terms of the GNU Free Documentation License or the Creative Commons Attribution Share-Alike License. ]

In both trees shown above, all the nodes in sub-tree A have keys less than p, all the nodes in sub-tree B have keys between p and q, and all the nodes in sub-tree C have keys greater than q. Transforming the tree so that the root moves from p to q is called a left rotation, and transforming the tree so that the root moves from q to p is called a right rotation; in both cases the root node moves down the tree toward the leaves. Note that both rotations preserve the tree-order property. It is the goal of the insert and delete algorithms to use rotations to maintain the heap-order property.

Your task is to implement the lookup, insert, update, delete and enlist operations on treaps. When you are finished, you are welcome to read or run a suggested solution, or post your solution or discuss the exercise in the comments below.

Finally finished. It took me a while to get everything working since I went all out and implemented the full IDictionary interface. It was fun to get all of the code to work properly, but there’s clearly some optimization work needed to make the thing useful. My implementation’s performance compares unfavorably to the .Net framework’s built in System.Collections.Generic.Dictionary in all of my tests.