Architecture, Functional Languages, Style

F# Data Structures: Set

Apr 10th, 2016

As I’ve been going through Okasaki’s book on functional data structures, it occurred to
me that it’d also be a lot of fun to run through the F# implementation of it’s core data
structures and compare them with the book Purely Functional Data Structures.

It’s a lot of fun seeing how fundamental and still powerful the data structures used in
Purely Functional Data Structures are and has made reading through this book even more
satisfying.

The first significant data structure in Okasaki is the Set, which is implemented using
a binary search tree. It’s simple enough to find the F# Core implementation of set in
the GitHub repository.

So the Set type is definitely built on top of a binary search tree. Now the question is,
what type of binary search tree and that can be determined by looking at the functions
used on the SetTree type. There’s the expected add and remove and within those
functions are calls to a rebalance function. Meaning that F# is using a self balancing
BST for its Set implementation. Here’s the code for the rebalance function:

123456789101112131415161718192021222324252627282930

letrebalancet1kt2=lett1h=heightt1lett2h=heightt2ift2h>t1h+tolerancethen// right is heavier than left matcht2with|SetNode(t2k,t2l,t2r,_)->// one of the nodes must have height > height t1 + 1 ifheightt2l>t1h+1then// balance left: combination matcht2lwith|SetNode(t2lk,t2ll,t2lr,_)->mk(mkt1kt2ll)t2lk(mkt2lrt2kt2r)|_->failwith"rebalance"else// rotate left mk(mkt1kt2l)t2kt2r|_->failwith"rebalance"elseift1h>t2h+tolerancethen// left is heavier than right matcht1with|SetNode(t1k,t1l,t1r,_)->// one of the nodes must have height > height t2 + 1 ifheightt1r>t2h+1then// balance right: combination matcht1rwith|SetNode(t1rk,t1rl,t1rr,_)->mk(mkt1lt1kt1rl)t1rk(mkt1rrkt2)|_->failwith"rebalance"elsemkt1lt1k(mkt1rkt2)|_->failwith"rebalance"elsemkt1kt2

So F# is using what looks like a slightly modified AVL tree, which uses a definable
tolerance for how unbalanced the tree can get before rebalancing happens. The
strict AVL tree rebalances if the difference in height between the left and right
branches is greater than 1.

There’s another interesting function nestled within the SetTree module: split.
This function interested me because it wasn’t referenced in the balance or the
add/remove functions.

This function has type SetTree, boolean, SetTree and takes a SetTree t and a pivot
value pivot. The first SetTree in the return type is the set of all values in t
which are less than pivot, the boolean is true if pivot is in t, and the right
SetTree contains all the values in t which are greater than pivot.

This function takes two trees t1 and t2 and merges them together and returns a
single TreeSet with all the elements from t1 and t2. Here’s how split is used
in this function:

Assuming t1 is taller than t2 (it’s basically the same if t2 is taller than
t1)

split all the elements in t2 by if they are smaller than the root value of t1.
Call the set of nodes smaller than the t1 root lo and the set of larger nodes hi.

Use union to merge the left subtree of t1 with lo and to merge the right subtree
of t1 with hi.

Balance the to newly generated subtrees. This leaves a set of all the values smaller
than the t1 value and a set of all the values larger than t1. These can then be used
to create a new TreeSet representing the fully balanced merging of t1 and t2.