Post navigation

A Unified Collection of Purely Functional List-like Data Structures

The FSharpx.Collections namespace now provides a collection of linear data structures deriving from the List signature. To emphasize the unity of the collection I implemented a standardized nomenclature expanding on the List value names. This is not without controversy. Structures like Queue are well-known in other (mostly imperative) languages, but I believe together these structures exhibit more similarities than differences, and bringing them all together in one F# collection is an opportunity to emphasize that logical unity.

My intent was to expand the List signature nomenclature with the naming standard favored by Okasaki, but “init” as the name for the inverse of “tail” would not do as this conflicts with a List module value. So this value is named “initial”. And I made one other change from Okasaki. In recognition of Steffen Forkmann’s F# implementation of the Vector structure from Clojure being the basis of two structures in this collection (Vector and RandomAccessList), I have opted to name the end-insertion function/member “conj” instead of “snoc”.

The List-like Immutable Data Structures

The following structures provide features perhaps available from List and Array, but not efficiently implemented and/or not in the right combination for a particular task, and the full composability and immutability you expect from purely functional data structures.

Deque (Double-ended queue) is an ordered linear structure implementing the signature of List (head, tail, cons) as well as the mirror-image Vector signature (last, initial, conj). “Head” inspects the first or left-most element in the structure, while “last” inspects the last or right-most element. Ordering is by insertion history.

DList is an ordered linear structure implementing the List signature (head, tail, cons), end-insertion (conj), and O(1) append. Ordering is by insertion history. DList is an implementation of John Hughes’ append list.

Heap is an ordered linear structure where the ordering is either ascending or descending. “Head” inspects the first element in the ordering, “tail” takes the remaining structure after head, and “insert” places elements within the ordering. PriorityQueue is available as an alternate interface.

LazyList is an ordered linear structure implementing the List signature (head, tail, cons), but unlike the other linear structures computation of elements is delayed, executed once on demand, and thereafter cached. Adapted from the PowerPack implementation with the List signature values available from within the type class.

Queue is an ordered linear data structure where elements are added at the end (right) and inspected and removed at the beginning (left). Ordering is by insertion history. The qualities of the Queue structure make elements first in, first out (fifo). “Head” inspects the first or left-most element in the structure, while “conj” inserts an element at the end, or right of the structure.

RandomAccessList is an ordered linear structure implementing the List signature (head, tail, cons), as well as inspection (lookup) and update (returning a new immutable instance) of any element in the structure by index. Ordering is by insertion history.

Vector is an ordered linear structure implementing the inverse of the List signature, (last, initial, conj) in place of (head, tail, cons). Indexed lookup or update (returning a new immutable instance of Vector) of any element is O(log32n) — just about O(1). Length is O(1). Ordering is by insertion history.

Times are milliseconds on a 2.2GHz 4GB dual core 64-bit Windows 7 machine. Orders of magnitude represent either the beginning or resulting number of elements in the structure. Milliseconds is derived by dividing ticks by 10,000. More on the benchmarking methodology can be found here. The data structure benchmark application can be found here.

Add elements to empty structure

102

103

104

105

106

ms.f#.array

0.8

1.8

100.9

11771.4

n/a

ms.f#.array — list

0.3

1.0

69.5

n/a

n/a

ms.f#.list

0.4

0.4

0.4

1.0

13.8

ms.f#.list — list

0.7

0.7

0.9

2.3

45.3

fsharpx.deque — conj

0.3

0.3

0.5

4.7

*

fsharpx.deque — cons

0.3

0.3

0.5

4.7

*

fsharpx.dlist — conj

0.7

0.7

1.0

7.7

153.0

fsharpx.dlist — cons

0.7

0.7

1.0

6.4

118.4

fsharpx.heap

3.2

3.3

5.0

22.5

254.7

fsharpx.lazylist

0.9

0.9

1.0

2.6

108.3

fsharpx.queue

1.0

1.1

1.4

7.6

106.6

fsharpx.randomaccesslist

0.8

0.9

3.3

19.6

189.8

fsharpx.vector

0.8

0.9

3.3

19.7

189.1

Comments

1) Depending on the structure’s signature by invoking cons or conj using seq.fold.

3) Note that repeatedly adding an element to an existing array does not scale.

4) (*) I had trouble getting any Deque benchmarks at scale 1M to complete in reasonable time and have yet to establish whether this is a problem with my benchmark infrastructure or the Deque implementation or a combination thereof.

Initialize structure

102

103

104

105

106

ms.f#.array

0.1

0.1

0.1

0.2

1.3

ms.f#.array — ofList

0.2

0.2

0.3

0.5

2.5

ms.f#.list — ofArray

0.2

0.2

0.2

0.7

12.7

ms.f#.list

0.0

0.0

0.0

0.0

0.0

fsharpx.deque

0.6

0.6

0.6

1.0

*

fsharpx.dlist

1.5

1.5

1.7

3.5

49.8

fsharpx.heap

4.1

4.2

5.7

20.9

235.4

fsharpx.lazylist — ofArray

0.3

0.3

0.3

0.3

0.3

fsharpx.queue

1.0

1.0

1.1

1.6

13.5

fsharpx.randomaccesslist

4.4

4.5

5.2

11.5

156.5

fsharpx.vector

3.0

3.1

3.6

8.1

69.3

Comments

1) Using the respective module’s ofSeq, or different function where indicated.

3) Queue and Deque both support O(1) ofList which would load from a list in a fraction of a millisecond.

Peek and Dequeue until the structure is empty

102

103

104

105

106

ms.f#.list

0.1

0.1

0.1

0.2

1.0

fsharpx.deque — tail

1.9

2.0

2.2

5.2

*

fsharpx.deque — initial

2.9

2.9

3.3

8.2

*

fsharpx.dlist

0.6

0.6

1.0

6.4

105.8

fsharpx.heap

0.5

0.6

0.7

1.9

13.5

fsharpx.lazylist

0.9

1.0

2.2

21.3

254.1

fsharpx.queue

0.5

0.5

0.9

1.8

48.2

fsharpx.randomaccesslist

0.9

1.0

2.1

13.6

108.9

fsharpx.vector

0.9

1.0

2.1

13.6

114.7

Comments

1) Inspects element with either head or last and recursively takes tail or initial, depending on structure signature.

Use IEnumerable to iterate through each element

102

103

104

105

106

ms.f#.array

0.3

0.3

0.4

1.1

8.4

ms.f#.list

0.7

0.7

0.8

2.0

14.0

fsharpx.deque

2.2

2.3

2.6

5.5

*

fsharpx.dlist

1.7

1.8

3.3

22.1

214.1

fsharpx.heap

5.3

5.6

6.6

28.8

450.5

fsharpx.lazylist

3.1

3.2

4.4

23.0

278.3

fsharpx.queue

2.0

2.0

2.4

5.3

50.2

fsharpx.randomaccesslist

1.6

1.7

1.8

3.9

24.8

fsharpx.vector

1.7

1.7

1.9

3.9

26.2

Reverse

102

103

104

105

106

ms.f#.array

0.1

0.1

0.1

0.2

1.1

ms.f#.list

0.2

0.2

0.2

0.4

1.8

fsharpx.deque

0.0

0.0

0.0

0.0

*

fsharpx.heap

5.2

5.7

8.4

64.8

1097.1

fsharpx.queue

0.1

0.1

0.1

0.1

0.1

fsharpx.randomaccesslist

1.5

1.5

2.1

10.2

100.0

fsharpx.vector

1.4

1.4

2.0

7.7

97.4

Append

102

103

104

105

106

ms.f#.array

0.1

0.1

0.1

0.2

1.4

ms.f#.list

0.2

0.2

0.3

0.7

46.0

fsharpx.dlist

0.2

0.2

0.2

0.2

0.2

fsharpx.heap

0.4

0.4

0.4

0.4

0.4

fsharpx.lazylist

0.2

0.2

0.2

0.2

0.2

Comments

1) Using merge for the Heap structure.

Iterate by index

102

103

104

105

106

ms.f#.array

0.4

0.4

0.4

0.5

1.4

fsharpx.randomaccesslist

0.4

0.4

0.5

2.2

18.5

fsharpx.vector

0.4

0.4

0.5

2.0

19.1

Random lookup (10,000)

102

103

104

105

106

ms.f#.array

0.1

0.1

0.1

0.1

0.1

fsharpx.randomaccesslist

0.1

0.1

0.1

0.1

0.1

fsharpx.vector

0.1

0.1

0.1

0.1

0.1

Random update (10,000)

102

103

104

105

106

ms.f#.array

0.1

0.1

0.1

0.1

0.2

fsharpx.randomaccesslist

2.1

2.7

4.2

10.1

17.0

fsharpx.vector

2.2

2.7

3.4

6.9

17.0

Implementation Notes

1) I borrowed the structural equality implementation from Vector for the other structures. Heap perhaps does not need to used Unckecked.equals, but I have not profiled that option to see whether it would actually improve performance. More attention to equality checks taking advantage of internal structure may prove to be somewhat more efficient.

2) The structural equality implementation puts an internal mutable reference value in each structure that gets updated at most once per lifetime. I don’t think this will impede multi-threading use of the structures, but I don’t know for sure either.

3) As noted above there may be issues with Deque at scales >>100K elements. Another Deque in the “experimental” DataStructures namespace may meet the needs of your application better.