programmer things

Immutable Persistent Data Structures in Common Lisp

Jul21st, 201212:00 am

0. The Rationale

Clojure, Scala, and Haskell (and other languages) have recently brought the
idea of immutable (and persistent) data structures into some amount of
popularity. This is not a new idea, by any means, as Lisp has always
supported this with its list structure, which is essentially a linked
list. In Common Lisp and Scheme, lists are actually mutable but in
practice, this is rarely a problem because it is generally considered
bad form to go around mutating lists except under certain circumstances.
Most Lisp books, tutorials and references encourage using lists in a
functional style (that is, assuming and treating them as immutable
structures).

In Clojure, the big three data structures are the hashmap, vector, and set,
which each come with their own literal syntax. Clojure also has normal lispy
lists, too with language enforced immutability. These immutable data structures
can be approximated with normal lists in Common Lisp with the caveat that they
don’t retain the more efficient performance characteristics of Clojure’s data
structures. There are a few libraries for Common Lisp which provide these
structures with similar time and space complexity as Clojure’s implementations.
The one that I recommend is FSet which,
according to the Wayback Machine, has been around since at latest 2007.

What’s the point of learning to use these data structures in Common Lisp? Isn’t
Clojure better? Well, in a lot of ways, Clojure is a better language and
environment, but in a lot of ways, Common Lisp is better, too. I enjoy using
both languages. I’m not quite lucky enough to have a day-job that allows me
write much code in a Lisp, so I use it for fun. Therefore, my criteria for
languages that are fun tend to push me toward Common Lisp. I generally use SBCL
or ClozureCL as implementations which are native (not bound the the JVM) and
have a much faster startup time and interact easier with native libraries. I
don’t care much for the JVM, and I prefer native libraries to JVM libraries,
which are themselves, often just JNI wrappers on native code.
Tracebacks in SBCL and CCL are much easier to read than what you get with
Clojure. I also prefer non-lazy to lazy.

An aside on laziness: I like the idea of lazy collections, or iterators, or
generators, or streams (from SICP). They can be useful for certain
constructions, but they don’t really enable anything amazing that you can’t do
otherwise, just with a slightly different algorithm. It’s slower. Every
implementation of Standard ML (not lazy) I’ve tried has varied from just being
somewhat to several times faster than Haskell where all computation is lazy. I
think setting up all that delayed computation is expensive, and the only way I
found to make Haskell perform within the same ball park as Standard ML, was to
add hints here and there to get computations to go ahead and happen instead of
building a stack of thunks. I don’t have to encourage Standard ML to do a
computation, it just does it. Same with Common Lisp. Life is much simpler and
faster when computations just happen and nothing is lazy by default.

So, Clojure has its advantages. One of those is a slight reduction in the
number of parens with certain standard Lisp constructions like LET and COND.
Also, the collection literals are nice. Clojure is a Lisp-1, so the shared
namespace is generally better than Lisp-2’s like Common Lisp. Even so, there
are nice things about split function namespace, that is that you don’t have to
worry about shadowing built in functions. This is a problem in languages with a
single namespece. I don’t know how many programs in Python I’ve seen that
shadow the built in “id” function. It turns out that’s a really popular name
for a database column, and hence a variable name. In Common Lisp, you can name
your variable “list” even through there is a standard function called “list”,
none of this “lst” crap.

TLDR; It’s all a matter of taste, but for me Common Lisp is just a little more
fun to hack in than Clojure.

1. The List

I’ll try to keep this short. A Common Lisp list is a linked list constructed of
CONS cells. Each CONS has two parts the CAR and the CDR. The CAR generally
contains a list element and the CDR contains the next CONS cell, or NIL,
signifying the end of the list. An empty list is the same as NIL.

;; contruct a list with a single element, the keyword symbol :x(cons:xnil);;--> (:x);; alternately:(list:x);;--> (:x);; or with quoting (bypass evaluation):(quote(:x));;--> (:x);; shorthand for quoting is a single quote mark'(:x);;--> (:x);; CAR and CDR extract the cells of the CONS(car'(:x));;--> :x(cdr'(:x));;--> NIL;; A list of three elements:(cons:x(cons:y(cons:znil)));;--> (:x :y :z);; Also: (list :x :y :z) or '(:x :y :z)

So the way to use lists as an immutable, persistent data structure is that you
can construct a new list with an element added to the front of the list without
affecting the original, and the “tail” of the list is shared. Also, you can get
the list minus the front element, just by taking the CDR of the list.

In that example I took a list and first constructed a new list by adding a 0 to
the front, then another new list by adding a 1 to the front, then I got a list
without the first element, and then the original list, unscathed. No list was
mutated and no space was wasted, as all these lists share the same last 2 or 3
CONS cells.

2. Vector / SEQ

Of course, you might want to do something other than have an ordered sequence
with the ability to add or remove elements to the front. In Clojure, you’ve got
the vector, which is an ordered sequence that allows you to add or remove
elements anywhere in the list, and lookup arbitrary elements in O(log n) time
and with shared structure. You can do the same thing in Common Lisp with normal
lists but some of the operations will take O(n) time complexity and some
operations will return a brand new list with no shared structure. Let’s start
with that and then show how to get Clojure-style complexity characteristics
with the SEQ collection provided by the FSet library. For small enough
sequences you might just want to use normal lists, and avoid the FSet
dependency.

In none of the following examples is any collection mutated, instead the value
of the expression must be captured and used instead of the original, so when I
say “drop an element”, I really mean “return a new collection without an
element”, etc.

You can remove an element from the end of a list with BUTLAST, a funny sounding
function which basically creates a whole new list minus the last element. You
can also specify a number of elements to remove from the end of the list.

(butlast(list123));;--> (1 2)(butlast(list12345)2);;--> (1 2 3)

You can get the NTH element of a list:

(nth0(list12345));;--> 1

You can take a slice of a list with SUBSEQ:

(subseq(list:a:b:c:d)13);;--> (:B :C)

Or drop the first n elements with NTHCDR:

(nthcdr4(list:a:b:c:d:e:f));;--> (:E :F)

If you want to drop or add or change an element from the middle of the list you
might have to write or use one these simple functions to do that:

2. Set

In Common Lisp lists are also used to emulate sets. Not all operations
will have the optimum time complexity, but it’s generally adequate for
most purposes.

Common Lisp has various equality functions: EQ, EQL, EQUAL, and EQUALP. I will
not summarize them here, but for the sake of consistency with Clojure, we’ll
probably want to consider two elements to be equal with EQUALP. Many Common
Lisp functions take an optional :TEST parameter to specify which equality test
function to use. All of the set theory related functions do.

To test if a value is present in a set (really, just a list that we’re treating
as a set), there is the MEMBER function. It will return NIL if the value is not
present (which is the only “false” value in Common Lisp) or the subset of the
list starting with that value (which will be treated as “true” in conditional
statements, etc) if it is present.

It’s clear that Common Lisp was designed to use lists as sets when necessary,
but under certain conditions you might want to use a more efficient
implementation (say if you wanted to do a lot of membership tests in a larger
set, and would benefit from O(log n) membership test instead of O(n)) you can
use FSet’s SET. You can construct an empty set with EMPTY-SET or pre-populate
with the SET macro (which shadows the archaic and generally unused SET function
built into Common Lisp).

(empty-set);;--> #{ }(set:a:b:c);;--> #{ :A :B :C }

The FSet library comes with an equality function that is used automatically:
EQUAL?. It is slightly better than EQUALP in that it will find two sets equal
that are equivalent sets (same for other FSet collections).:

3. Map

There are two ways lists are used in Common Lisp to create mapping collections.

The plist is just a list with every two elements representing a key and a
value. It has one useful function, GETF which uses EQ for key equality and no
way to use any other comparison, making it only useful for using symbols for
keys:

You could easily build a repertoire of functions to do all the common
operations you might want with plists, but there is another form called the
alist which comes with a better set of operations built in. An alist is a list
of cons cells with the key in the CAR and value in the CDR. When you have a
CONS cell with a non-list in the CDR, it is called a dotted list and looks like
this: (car . cdr). Alists can be constructed in a variety of ways:

Alists are probably an 80% solution, which is plenty for most situations, but
FSet’s maps are quite nice and give you that O(log n) lookup and update time
complexity with the structural sharing that we’ve grown to love. I’ve found
them as useful as Clojure’s hashmaps. Like the other collections, there’s a
function EMPTY-MAP that’s self explanatory and a macro MAP for constructing
larger maps:

4. Conclusion w/bonus Feature

I only scratched the surface with the FSet library. There are more functions
than the ones I listed and some of the ones I describe actually return multiple
values, for example to signify if a key is present in case you are storing NIL
as values, etc. I recommend the FSet
Tutorial for more
details.

Lists in Common Lisp are really quite versatile and can approximate most other
structures for limited sizes. The nice thing about the FSet library is that it
always puts the collection as the first argument, so it is amenable to
Clojure’s threading macro: -> which I will translate here for your benefit into
Common Lisp: