Architecture, Functional Languages, Style

Purely Functional Data Structures: Chapter 2 - Binary Search Trees

Mar 21st, 2016

The second section of Chapter 2 deals focuses on binary search trees (BST) and uses BSTs to
implement a set data structure. Like the section
on lists, this gives a nice little explanation on how to deal with inserts to a tree data
structure when the data is immutable in as efficient a way as possible. The most brute
force solution to this would be to create a complete duplicate of the tree save for the
newly added value, this section shows that, while copying is necessary, only the search
path from the root to the new element needs to be copied. The new tree will then reference
as much of the old tree as possible; minimizing both copies and memory used.

What this post will cover: how to represent a binary search tree (BST) in F#, how to efficiently
update the BST while using immutable data structures, and some discussion on design and style.

First thing: how to represent the binary search tree in F#. A BST has two possible values:
it’s either Empty or it’s a some value with 2 children. So a discriminated union captures
this set and a tuple can store the value and 2 children. Leading, as with the list, to
a recursive type:

1

typeTree<'a>=Empty|TreeofTree<'a>*'a*Tree<'a>

The Tree on the left of Tree<'a> * 'a * Tree<'a> stores the left branch and the right
Tree stores the right branch.

A requirement for a BST is that 'a must have a total ordering: you must be able to take
any two values of type 'a and check if one is less than the other. The above definition
does not capture this requirement and so we can make a Tree of any type. In Purely
Functional Data Structure this is solved by using ML’s functor feature, which F# doesn’t
have. In F# this can be captured using a constraint on the type:

'a when 'a: comparison tells the F# compiler that 'a must implement the IComparable
interface which guarantees that the < operator can be used on two values of type 'a.

I changed the name of the type from Tree to BinarySearchTree to capture the more focused
nature of the type when a constraint is added. The Tree can be use to create any type of
binary tree structure, with the constraint the type becomes focused specifically on ordering
and searching. The updated name helps to reflect this focus. This type could also be put
into a module named BinarySearchTree to capture that information and then keep the type
named simply Tree. However, I am not yet sure which is superior, though I feel that
capturing the nature of the type, intrinsically, in the name so that it can’t be seperated
is the better path to follow.

As a note, there is another option with F# for capturing the comparability requirement. We
don’t have to put the constraint in the type definition. When we write our insert and
search functions, the F# compiler with infer that, for those functions, 'a must
have the comparison constraint. This decouples the use of the tree from the type definition.
If we want the type to capture only the structure of a binary tree, then this approach makes
sense. However, if we want the type to capture both the structure and the purpose (searching)
then this approach seems like it will be fragile and confusing.

I initially thought that there was a third option: add the search and insert functions as
member methods to the definition of Tree. It seemed likely that the F# compiler would
infer that 'a must have the comparison constraint. Testing proved that this is not the
case and, in fact, the compiler throws an error because Tree must explicitly have the
comparison constraint in order for the member methods to work.

IsMember

Checking membership in the tree is the first of the two absolutely necessary functions
for our BST. The logic is rather simple, but here I’ve started trying out the function
structure for when I’m writing a “polymorphic” function and leaving out explicity
parameters. I rarely used the function expression in the past so this is helping
me to get a feel for this as a part of my toolbox.

Leaving out the parameters in the
function definition is the point free
style of programming. Since point free style cuts out one source of self documentation
in code, I’m not sure I really like it. However, it fits well with the function expression
since the focus is the patterns that are being matched and that documents the parameters
well enough.

123456

letrecisMember=function|(x,Empty)->false|(x,Tree(l,n,r))->ifx<nthenisMember(x,l)// Experimenting with a new format for if/then. I like having the condition and it's direct conclusion on the same lineelifx>nthenisMember(x,r)elsetrue

Insert

The insert function is a little more complex than isMember: it needs to handle inserting
a new value into an immutable tree data structure:

insert travels from the root node, down the tree, visiting only the nodes on the way to
where the new node will get inserted. This is called the search path. As it moves,
insert makes a copy of each node on the search path. The nodes are copied so that they
can store the path to the new value, the nodes on the search path are the minimum number
of nodes that need to be copied. And of course, because data is immutable, we have to
copy some nodes in order to make our insert.

This diagram shows what’s happening in a nicely visual form. The brown nodes are the
copied nodes on the search path, the dashed lines represent the new links in those nodes.
Note that most of the original tree remains used in the new ‘post insert’ tree. A copy of
7 was made and it’s left child is the new 5 node, then a copy of 4 is made so that
4’s right points to the new 7. 4’s left still points to the old tree.

Problem 2.2

In this problem, we have to update isMember to reduce the number of conditions in the
execution path. This is done by propagating the last checked value through the traversal
and removing the > check:

isMember2 was my first attempt. While the last pattern matching expression has only
one conditional rather than two, it also has an additional pattern match check and so
it seems unlikely that the number of condition checks is actually reduced. I took a look
at the IL for this code and verified that it does indeed have just as many conditions
as `isMember. The worst part is, that pattern match check must get executed every time.

My second attempt became less idiomatic but it successfully reduces the number of
required conditionals:

While working on this code, I noticed that in F# I’m much more comfortable writing
if expressions on a single line. In other languages, I almost never do this (not
counting the ternary operator in the C family).

Problem 2.3: improving insert

If we insert a value that’s already in the tree, then the resulting tree will be
identical to the input tree. However, the current insert function will still
copy all the nodes on the search path. This problem Okasaki has us fix that by
throwing a fault if the value is already in the tree. This will bubble a break
up through the recursive path and break the execution path before any copies
are made: