Wednesday, August 13, 2008

Using Difference Lists

Difference lists are a (very) useful data structure in Prolog, particularly when it comes to parsing with definite clause grammars (DCGs) and when it comes to list construction (again, using DCGs). Why? Well, to answer that question, let's examine the structure of difference lists. The Prolog (or relational) representation of a difference list is of the form:

X = [1,2,3|Y] - Y

What that relation describes is that X is the difference of the pair of lists of (first) [1,2,3] appended with the list Yminus (second) the list Y.

Well, what is the value of the list Y? To a Prologian, this question is not important (for Y, being a logic variable, can be any list or, even, not a list at all because the variable is still uninstantiated (a "free" variable)). Its importance is that it gives us an "instant" index into the list after the elements we've already processed. The significance of that index for parsing is that we now have a stream over the parsed (list) data, which gives us the ability to translate BNF grammars directly into working parsers. The significance for list construction is that we "simply" walk "backward" through the DCG to construct a list from a (parsed) internal representation, without paying the linear cost of either (a) traversing the list each time to add a new element (which results in exponential time list construction) or (b) building the list in reverse (adding new elements to the head of the list) and then applying reverse to the end result.

"Big deal!" you scoff, "zippers [also called 'finger trees'] already give us indices into data structures. Who needs a funny-looking specialized zipper just for lists?"

"Good point!" is my response. Zippers are powerful and flexible data structures that not only provide that index over any data structure, but, even better than difference list's index that only goes forward (with any dignity) in constant time, the zipper's index can go both forward and backward through that (generic) structure in constant time. So, if you are not familiar with zippers, I request that you run, don't walk, to a good overview to learn more about their utility.

With all their flexibility and power, in certain situations, a zipper is not the correct choice for some list processing tasks; the difference list's simplicity shines in these situations. First, for the unidirectional index of the difference list (it moves forward through the list well, but backwards ... not so much), this, for list construction purposes, is hardly ever an issue, and if it becomes one, then backtracking is easily installed with some MonadPlus magic. Next, you, the coder, are responsible for constructing a zipper list data type each time you need one, but Don Stewart has provided Data.DList; no construction necessary! Last, an example structure of a zipper list, for the list a++b is (reverse a,b), so if we wish to reconstruct the original list, for an a of large magnitude, we still pay the linear (reverse) cost in reconstruction as well as the linear cost of appending b to that result.

... snoc appends an element to the end of a list (so it's the reverse of cons).

Prologians are always smug with their ability to create an infinite stream of 1s with X = [1|X]. And, no doubt about it, Prolog is a programming language that makes list construction easy ... *ahem* almost as easy as list construction in Haskell ("Your list comprehensions outshine even Prolog for sure..."):

A very simply example, I grant you, for if we take a list whose elements are all in the set of the alphabet, the above code is a very complicated way of reinventing the id function. Or, put another way, who's going to teach me about scanl? Hm, would I be required to use the continuation Monad to escape the scan, I wonder? But you get the general idea for list construction, right? Instead of building the list by adding to the head ("cons"ing) and then applying reverse to the result, the different list allows one to "snoc" each element and return the correctly ordered result. For a single list construction, the benefit may not be all that obvious, but when one builds, e.g., an XML generator for a complex internal representation, reversing the reverse of the reversed reversed child element becomes an odious labour — difference lists eliminate all the internal juggling, replacing that complication with a straightforward constant-time list and element in-place append.

So, next time you need to do some list construction, instead of falling back on the reverse standby, flex your new difference list muscles; you'll be pleasantly surprised.

12 comments:

Um. No they're not. A finger tree is a totally different data structure to a zipper. A zipper is a view into another data structure with a "special point" in that structure. A finger tree is an tree designed for sequences with fast concat, append and prepend.

Yes, finger trees aren't the same as zippers, but zippers are really just fingers. Finger trees could have been called zipper trees if Kaplan & Tarjan hadn't gotten there first. Their fingers are actually special points in trees, along with a path from those points to the root, just like a zipper!