After working with 2-3 finger trees for quite a bit I have been impressed by their speed in most operations. However, the one issue I have run into is the large overhead associated with the initial creation of a large finger tree. Because building is defined as a sequence of concatenation operations you end up building a large number of finger tree structures that are unneeded.

Due to the complex nature of 2-3 finger trees I see no intuitive method for bootstrapping them, and all my searches have come up empty. So the question is, how could you go about bootstrapping a 2-3 finger tree with minimal overhead?

To be explicit: given a sequence $S$ of known length $n$ generate the finger tree representation of $S$ with minimal operations.

The naive way to accomplish is successive calls to the cons operation (in the literature the '$\triangleleft$' operator). However, this will create $n$ distinct finger tree structures representing all the slices of $S$ for $[1..i]$.

3 Answers
3

GHC's Data.Sequence's replicate function
builds a fingertree in $O(\lg n)$ time and space, but this is enabled by knowing the elements that go on the right spine of the finger tree from the get-go. This library was written by the authors of the original paper on 2-3 finger trees.

If you want to build a finger tree by repeated concatenation, you might be able to reduce the transient space usage while building by changing the representation of the spines. The spines on 2-3 finger trees are cleverly stored as synchronized singly-linked lists. If, instead, you store the spines as deques, it may be possible to save space when concatenating trees. The idea is that concatenating two trees of the same height takes $O(1)$ space by reusing the spines of the trees. When concatenating 2-3 finger trees as originally described, the spines that are internal to the new tree can no longer be used as-is.

A very interesting idea. I'll have to look into this and see what the trade-offs would be to the overall data structure.
–
jbondesonMar 9 '11 at 18:26

I meant for there to be two ideas in this answer: (1) The replicate idea (2) Faster concatenate for nearly-equally-sized trees. I think the replicate idea can build finger trees in very little extra space if the input is an array.
–
jbappleMar 10 '11 at 3:55

Yes, I saw both. Sorry I didn't comment on both of them. I'm looking into the replicate code first -- though I'm definitely stretching my Haskell knowledge as far as it will go. At first blush it does look like it could solve most of the issues I'm having, provided you have fast random access. The fast concat could be a little more generic solution in the case of no random access.
–
jbondesonMar 10 '11 at 16:06

myFromList (in a slightly more efficient version) is already defined and used internally in Data.Sequence for constructing finger trees that are results of sorts.

In general, the intuition for replicateA is simple. replicateA is built on top of the applicativeTree function. applicativeTree takes a piece of a tree of a size m, and produces a well-balanced tree containing n copies of this. The cases for n up to 8 (a single Deep finger) are hard coded. Anything above this, and it invokes itself recursively. The "applicative" element is simply that it interleaves the construction of the tree with threading effects through, such as, in the case of the above code, state.

The go function, which is replicated, is simply an action which gets the current state, pops an element off the top, and replaces the remainder. On each invocation, it thus steps further down the list provided as input.

Some more concrete notes

main = print (length (show (Seq.fromList [1..10000000::Int])))

On some simple tests, this yielded an interesting performance tradeoff. The main function above ran almost 1/3 lower with myFromList than with fromList. On the other hand, myFromList used a constant heap of 2MB, while the standard fromList used up to 926MB. That 926MB arises from needing to hold the entire list in memory at once. Meanwhile, the solution with myFromList is able to consume the structure in a lazy streaming fashion. The issue with speed results from the fact that myFromList must perform roughly twice as many allocations (as a result of the pair construction/destruction of the state monad) as fromList. We can eliminate those allocations by moving to a CPS-transformed state monad, but that results in holding on to far more memory at any given time, because the loss of laziness requires traversing the list in a non-streaming manner.

On the other hand, if rather than forcing the entire sequence with a show, I move to just extracting the head or last element, myFromList immediately presents a bigger win -- extracting the head element is nearly instant, and extracting the last element is 0.8s. Meanwhile, with the standard fromList, extracting either the head or last element costs ~2.3 seconds.

This is all details, and is a consequence of purity and laziness. In a situation with mutation and random access, I would imagine the replicate solution is strictly better.

However, it does raise the question of whether there is a way to rewrite applicativeTree such that myFromList is strictly more efficient. The issue is, I think, that the applicative actions are executed in a different order than the tree is naturally traversed, but I haven't fully worked through how this works, or if there is a way to resolve this.

(1) Interesting. This looks like the correct way to do this task. I am surprised to hear that this is slower than fromList when the entire sequence is forced. (2) Maybe this answer is too code-heavy and language-dependent for cstheory.stackexchange.com. It would be great if you can add an explanation how replicateA works in a language-independent manner.
–
Tsuyoshi ItoMar 10 '11 at 19:31

While you wind up with a large number of intermediate fingertree structures, they share the vast majority of their structure with one another. In the end you allocate at most twice as much memory as in the idealized case, and the remainder is freed with the first collection. The asymptotics of this are the same as good as they can get, since you need a fingertree filled with n values in the end.

You can build the fingertree by using Data.FingerTree.replicate and them using FingerTree.fmapWithPos to look up your values in an array which plays the role of your finite sequence, or using traverseWithPos to peel them out of a list or other known-sized container.

This will allocate O(log n) nodes for the initial replicated skeleton, and then replace them with the O(n) nodes needed to populate the skeleton, 'wasting' at most O(log n) memory until garbage collection cleans things up, So instead of the optimal ~1000 nodes you'd have to pay for ~1010, instead of ~2000 from cons'd construction.

Finally, you can avoid using replicate at and generating that interim O(log n) memory replicated tree, by using replicateA, but as sclv noted, the tupling for manipulating the state intrinsic to mapAccumL or traversing with a state monad yourself will actually introduce proportionally similar overhead to just paying for all the extra product cells.