C/C++

Skip Lists in C++

By Bill Whitney, November 01, 1998

Skip lists are an efficient alternative to balanced trees, and rather easier to implement correctly.

If you're like me, you're always looking for an alternative data structure that not only performs admirably, but is easy to implement and understand as well. The skip list, described by William Pugh in "Skip Lists: A Probabilistic Alternative to Balanced Trees" [PDF] fits this description.

Skip lists are ordered key/object-based containers offering excellent performance with minimal processing overhead. Although skip lists resemble linked lists, they actually have more in common with balanced trees when it comes to performance. The most important difference is that skip lists require none of the periodic reorganization required with balanced trees  the searching, insertion, and removal algorithms are simpler to code, understand, and extend.

How Skip Lists Work

Figure 1 shows a skip list containing a product inventory.
All data is contained in a node and identified by a key. Probably the most outstanding
feature of the nodes is the variations in their heights. By filling a list with
nodes of varying heights, the search algorithm can "see" past subsequent nodes
when searching for a particular value. For example, given a node list in which
50% of the nodes are twice as high as the rest, search time is cut in half.
The algorithm needs to look at only half of the values.

Figure 1: A sample skip list

As the height of the nodes increase, the visibility between nodes has the potential to be much greater, resulting in even shorter searches. In the ideal skip list, nodes of designated heights would be distributed throughout the list at pre-determined intervals. This would guarantee the maximum benefit from the nodes' look-ahead capability. To sustain performance, however, skip lists employ a probabilistic approach to assigning node height. This approach also removes the requirement for constant reorganization to maintain pre-determined intervals. I'll show the algorithm for probabilistic node height a little later. Consult the reference at the end of this article for a detailed mathematical analysis.

Looking back at Figure 1, the head and tail are actual
nodes in the skip list, but they serve only as markers for the beginning and
end; they contain no useful data. The head points to the first visible node
at each height (only A and C in Figure 1). Any node for
which there exists no subsequent node at its height (C, D, I and J) points to
the "tail" node. Suppose a search is made for product code H2345 in the product
inventory. Each search starts at the head of the skip list, comparing the sought-after
key with the first node pointed to by the root node at the list's current height.

Before I go any further, I should explain the concept of "maximum height." The maximum height of a skip list defines the number of vertical pointers the skip list can maintain. I use the term "current height" to refer to the tallest node that has been inserted into the skip list so far. Because node height is probabilistic, current height reaches maximum height as a skip list matures. This will become clearer when I present the code.

Referring to Figure 1, the product sought (H2345) is
located in node E. This sample skip list has a current height of 4, a maximum
height of 4, and contains ten nodes labeled A through J. All searches start
at the root node, from the level indicated by current height (4). Root node
level 4 points to node C at level 4. Comparing the sought key (H2345) with node
C's key (B2000), shows that the search is still below the value needed. Next,
the search algorithm follows node C's level 4 pointer, which brings it to the
list's tail. That's obviously too far, so the search drops down to node C's
level 3 pointer and looks ahead to node D. Node D's key value is G7009, which
is too low. Node D's level 3 pointer leads to the list's tail again so the search
drops down to level 2 and looks ahead. The next node is E, and checking its
key produces a match. The search algorithm had to traverse three nodes to locate
the key value H2345 in node E. If it were looking for the value P3629 in node
I (9th node in the list), the algorithm would have had to look at only four
nodes.

The C++ Implementation

I've implemented the skip list in two template classes, SkipList and
SkipNode. I also use a helper class called RandomHeight, which
is based on Pugh's algorithm for generating probabilistically random node heights.
RandomHeight is called upon by the SkipList insert method when
new nodes are generated. It appears in Listing One (RandomHeight.h)
and Listing Two (RandomHeight.cpp).

The RandomHeight constructor lays the ground rules for the scope of the random numbers to be generated. These rules consist of the maximum node height (level), and the percentage of nodes that should appear at level 1. When new nodes are created, a random node height is obtained from RandomHeight's newLevel method. Given a height of 4 and a percentage of 0.5, for example, 50% of the generated values would produce level 1 nodes, 25% of the nodes would be level 2, 12.5% of the nodes would be level 3 nodes, and so on. The random(2) call produces a number between 0 and 1 using my compiler.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!