Main menu

Post navigation

An interesting list structure, part 1

As long-time readers of my blog know, I’m a big fan of functional persistent immutable list data structures. I recently learned about a somewhat bizarre but very practical implementation of such lists, and naturally I wanted to implement it myself to try it out. The list data structure I’m going to discuss today is called a “Hughes list”, after its creator John Muir Hughes. The Haskell implementation is called “DList”, which stands for “difference list”, and this data structure is also commonly used in Prolog and Scala.

Let’s start by revisiting the standard immutable persistent linked list, which takes the form of a stack. As you probably recall, by “immutable” we mean that the operations like Push and Pop on the list do not change the list; it is immutable. Rather, they return a new list that has the desired content. Since lists are immutable, their contents can be re-used in new lists without worrying that a mutation will wreck things for the code that is using the new version; this is what is meant by “persistent” in this context.

In this implementation I’m going to implement the usual push, pop, peek and “is empty?” operations, but I’m going to add two more: “append”, which adds a single item to the list at the “wrong” end, and “concatenate”, which concatenates two lists together.

That is, if we have a list 1, 2, 3 and we push 4, the result is 4, 1, 2, 3. If we have a list 1, 2, 3 and we append 4, we have the list 1, 2, 3, 4. And if we have list 1, 2, 3, and we concatenate list 4, 5, 6, the result is 1, 2, 3, 4, 5, 6. All clear?

Essentially I’m implementing catenable deques here, but as we’ll see, they are very inefficient in this implementation.

I’m also going to implement two static list constructors: one that makes an empty list, and one that makes a single-item list.

Oh, and before we begin: I’m going to use the standard simple recursive implementations of these operations. In practice, these would blow the stack in C# if the lists got big, because C# is not consistently tail-recursive. Were I designing these types for industrial use, I’d write iterative algorithms instead of recursive algorithms, and I’d also follow good practices like implementing IEnumerable and so on. These types are not intended for use in production; they are for pedagogic purposes only.

All right, hopefully that was pretty straightforward. Empty, Single, IsEmpty, Peek, Push and Pop are all O(1). Concatenate is O(n) in the size of “this”, except for the special case of concatenating an empty list, which is O(1) regardless of the size of “this”. Append is implemented as Concatenate, so it is O(n).

The only algorithm which is even slightly tricky is Concatenate, but hopefully you see how it works. Basically, we’re using recursion to reverse “this” and then push each element of the reversed list onto s. As I noted above, a more efficient and safe implementation would be to actually reverse the list in a loop rather than using recursion to put that information onto the call stack, but whatever, this is for pedagogy — however, we will return to a related issue in a future episode, so keep it in mind!

The problem here is that we are often in a situation where Append and Concatenate are frequent operations, but Peek and Pop are not. For example, we might be in a “list building” scenario where we are composing a large list out of many smaller lists, and wish to be able to efficiently concatenate them.

I already did a series on using finger trees to implement an efficient catenable deque, but finger trees are a quite complex and difficult data structure to understand; you may recall that I did not even implement the efficient “break up/concatenate” operations because they were just too much work. The great thing about Hughes lists is that they are a seemingly very simple data structure that efficiently implements list concatenation. In fact, here it is:

And… that’s it. A Hughes list is a struct that wraps a single delegate; the delegate takes a simple list and returns a simple list. Of course I have left out all the methods and properties, but the delegate is the only data that is stored in a Hughes list.

I know what you’re thinking, because it’s the same thing that I thought when I first heard about this: that sounds completely bonkers. How can a function from lists to lists be a representation of a list? Where’s the head and the tail and all that stuff?

Next time on FAIC: We’ll find out how it works and build an implementation in less than two dozen lines of code.

They share the same name, they kind of look the same a bit (the Prolog version has a hole you fill with unification; the Hughes version you have a hole you fill with a suffix), but they’re pretty different.

Thanks for that helpful comment; I did not do any research into how these work in Prolog, and so I had no idea that we have yet another example of two things with the same name and different implementations!

I’m interested to see the next post to find out the details, so I’m avoiding “spoilers” rather than googling it, but my first thought is that it’s odd to base a data structure on a delegate because the delegate could be capturing arbitrary amounts of information. Isn’t it sort of equivalent to saying that the implementation is an abstract class where “f” is a virtual method?

Yes, that’s exactly what we’ll see; the closures that are generated in practice have fields that logically form a binary tree, and that’s where the catenable list structure is. But the implementation is really elegant; all that gear gets hidden.

I have a long-term plan where, if it all goes right, a whole bunch of these topics will get tied together; there are ways you can use the techniques of Hughes lists to optimize automatic differentiation that I introduced in the previous series, and that then has implications for some of the techniques of managing probability distributions I’m going to discuss when this series is done. But at this point I’m not sure that I have the time to really pull it all together in a satisfying way.