This recipe from Clojure Cookbook is one of my all-time favorites. Contributor
(and all-around awesome dude) Leonardo
Borges submitted a two-part
series on building custom data structures in Clojure, using
core.match no less. I’d never really
dug into the Clojure sequence abstraction before, but this recipe prompted me
to dig in further than I’d ever gone before.

While Custom Data Structures is a two-part series, what I really want to
showcase is Clojure’s sequence abstractions, which are featured in Part II. To
catch up on the details of implementing a Red-black tree, visit Part
I. Or hey, buy the book.

You’ll probably notice this recipe is credited to both Leonardo and myself.
The reason for this is that I substantially changed the implementation and
explanation based on feedback from my colleagues at Cognitect.

This isn’t to diminish Leonardo’s accomplishments, we just don’t want to
put words in other people’s mouths.

The most important part of participating in sequence abstraction
is the ability to expose values of a data structure sequentially. The
built-in tree-seq is well suited for this task. One extra step is
needed, however; tree-seq returns a sequence of nodes, not values.

Here’s the final rb-tree->seq function:

(defn-rb-tree->tree-seq"Return a seq of all nodes in an red-black tree."[rb-tree](tree-seqsequential?(fn[[_left_right]](removenil?[leftright]))rb-tree))(defnrb-tree->seq"Convert a red-black tree to a seq of its values."[rb-tree](map(fn[[__val_]]val)(rb-tree->tree-seqrb-tree)))(rb-tree->seq(->nil(insert-val5)(insert-val2)));; -> (5 2)

Since RBTs most closely resemble sets, they should adhere well to the
IPersistentSet interface. Extend the IPersistentSet and IFn
protocols to a new RedBlackTree type, implementing all of the
necessary functions. It’s also wise to implement the multimethod
print-method for RedBlackTree, as the default implementation will
fail for RedBlackTree as implemented:

Discussion

In the end, it doesn’t take a lot to participate in the sequence
abstraction. By implementing a small handful of interface functions, the
red-black tree implementation from Implementing Custom Data Structures: Red-black Trees Part I
can participate in an array of sequence-oriented functions: map, filter,
reduce, you name it.

At its essence, clojure.lang.IPersistentSet is an abstraction of what it
means to represent a mathematical set structure; this matches a tree data
structure well. A set isn’t a list or sequence, though. So how is RedBlackTree
then said to be participating in the sequence abstraction?

In Clojure, types extending the clojure.lang.ISeq interface are true
sequences, represented as a logical list of head and tail. While
IPersistentSet does not inherit from ISeq, it does share a common
ancestry with it. Both interfaces extend
clojure.lang.IPersistentCollection and its parent
clojure.lang.Seqable. As luck would have it,footnote:[Actually, as
design would have it.] sequence functions rely on collections being
Seqable, not ISeq. Since RedBlackTree can be read as a
sequence, it is Seqable and can be operated on by all of the
sequence functions you know and love.

Most of the functions in the IPersistentSet interface are self-explanatory, but some deserve further explanation. The function cons
is a historical name for constructing a new list by appending a value
to an existing list. seq is intended to produce a sequence from a
collection, or nil if empty:

The most challenging part of any Seqable implementation is actually
making a sequence out of the underlying data structure. This would be
particularly challenging if you needed to write your own lazy
tree-traversal algorithms, but luckily Clojure has a built-in function,
tree-seq, that does precisely this. By leveraging tree-seq to
produce a sequence of nodes, it is trivial to write an rb-tree->seq conversion function that lazily traverses a RedBlackTree, yielding
node values as it goes.

tree-seq accepts three arguments:

branch?: A conditional that returns true if a node is a branch (not
a leaf node). For RedBlackTree, sequential? is an adequate check, as
every node is a vector.

children: A function that returns all of the children for a given node.

root: The node to begin traversal on.

Note

tree-seq performs a depth-first traversal of trees. Given how
red-black trees are represented, this will not be an ordered
traversal.

With a sequence conversion function in hand, it is easy enough to
write the seq function. Similarly, cons and empty are a breeze–simply utilize the existing tree functions. Equality testing can be a
bit more difficult, however.

For the sake of simplicity, we chose to implement equality (equiv)
between onlyRedBlackTree instances. Further, the implementation
compares a sorted sequence of their elements. In this case, equiv is
answering the question, “Do these trees have the same values?” and not
the question, “Are these the same trees?” It’s an important
distinction, one you’ll need to consider carefully when implementing
your own data structures.

As discussed in Determining if a Collection Holds One of Several Values,
one of the big bonuses of sets is their ability to be invoked just like any
other function. It’s easy enough to provide this ability to RedBlackTrees
too. By implementing the single-arity invoke function of the
clojure.lang.IFn interface, RedBlackTrees can be invoked like any other
function (or set, for that matter):

(some(rbt[2357])[6]);; -> nil
((rbt(range10))3);; -> 3

Even with the full IPersistentSet interface implemented, there are
still a number of conveniences RedBlackTree is lacking. For one, you
need to use the kludgy /->RedBlackTree or RedBlackTree. functions
to create a new RedBlackTree and add values to it manually. By
convention, many built-in collections provide convenience functions
for populating them (aside from literal tags like [] or {}, of
course). It’s easy enough to mirror vec and vector for RedBlackTrees:

You may also have noticed printing is not a concern of the sequence
abstraction, although it is certainly an important consideration to
make for developing developer- and machine-friendly data structures.
There are two types of printing in Clojure: toString and pr-based
printing. The toString function is intended for printing
human-readable values at the REPL, while the pr family of functions
are meant (more or less) to be readable by the Clojure reader.

To provide our own readable representation of RBT, we must implement
print-method (the heart of pr) for the RedBlackTree type. By
writing in a “tagged literal” format (e.g., #rbt), it is possible to
configure the reader to ingest and hydrate written values as
first-class objects: