A Unified Collection of Purely Functional List-like Data Structures

The FSharpx.Collections namespace now provides a collection of linear data structures deriving from the List signature. To emphasize the unity of the collection I implemented a standardized nomenclature expanding on the List value names. This is not without controversy. Structures like Queue are well-known in other (mostly imperative) languages, but I believe together these structures exhibit more similarities than differences, and bringing them all together in one F# collection is an opportunity to emphasize that logical unity.

My intent was to expand the List signature nomenclature with the naming standard favored by Okasaki, but “init” as the name for the inverse of “tail” would not do as this conflicts with a List module value. So this value is named “initial”. And I made one other change from Okasaki. In recognition of Steffen Forkmann’s F# implementation of the Vector structure from Clojure being the basis of two structures in this collection (Vector and RandomAccessList), I have opted to name the end-insertion function/member “conj” instead of “snoc”.

The List-like Immutable Data Structures

The following structures provide features perhaps available from List and Array, but not efficiently implemented and/or not in the right combination for a particular task, and the full composability and immutability you expect from purely functional data structures.

Deque (Double-ended queue) is an ordered linear structure implementing the signature of List (head, tail, cons) as well as the mirror-image Vector signature (last, initial, conj). “Head” inspects the first or left-most element in the structure, while “last” inspects the last or right-most element. Ordering is by insertion history.

DList is an ordered linear structure implementing the List signature (head, tail, cons), end-insertion (conj), and O(1) append. Ordering is by insertion history. DList is an implementation of John Hughes’ append list.

Heap is an ordered linear structure where the ordering is either ascending or descending. “Head” inspects the first element in the ordering, “tail” takes the remaining structure after head, and “insert” places elements within the ordering. PriorityQueue is available as an alternate interface.

LazyList is an ordered linear structure implementing the List signature (head, tail, cons), but unlike the other linear structures computation of elements is delayed, executed once on demand, and thereafter cached. Adapted from the PowerPack implementation with the List signature values available from within the type class.

Queue is an ordered linear data structure where elements are added at the end (right) and inspected and removed at the beginning (left). Ordering is by insertion history. The qualities of the Queue structure make elements first in, first out (fifo). “Head” inspects the first or left-most element in the structure, while “conj” inserts an element at the end, or right of the structure.

RandomAccessList is an ordered linear structure implementing the List signature (head, tail, cons), as well as inspection (lookup) and update (returning a new immutable instance) of any element in the structure by index. Ordering is by insertion history.

Vector is an ordered linear structure implementing the inverse of the List signature, (last, initial, conj) in place of (head, tail, cons). Indexed lookup or update (returning a new immutable instance of Vector) of any element is O(log32n) — just about O(1). Length is O(1). Ordering is by insertion history.

Comparing Performance

I recently posted a performance preview of the Queue data structure. Here are the performance benchmarks across the list-like structures, including List and Array from the Microsoft.FSharp.Collections namespace for comparison.

Times are milliseconds on a 2.2GHz 4GB dual core 64-bit Windows 7 machine. Orders of magnitude represent either the beginning or resulting number of elements in the structure. Milliseconds is derived by dividing ticks by 10,000. More on the benchmarking methodology can be found here. The data structure benchmark application can be found here.

Add elements to empty structure

  102 103 104 105 106
ms.f#.array 0.8 1.8 100.9 11771.4 n/a
ms.f#.array — list 0.3 1.0 69.5 n/a n/a
ms.f#.list 0.4 0.4 0.4 1.0 13.8
ms.f#.list — list 0.7 0.7 0.9 2.3 45.3
fsharpx.deque — conj 0.3 0.3 0.5 4.7 *
fsharpx.deque — cons 0.3 0.3 0.5 4.7 *
fsharpx.dlist — conj 0.7 0.7 1.0 7.7 153.0
fsharpx.dlist — cons 0.7 0.7 1.0 6.4 118.4
fsharpx.heap 3.2 3.3 5.0 22.5 254.7
fsharpx.lazylist 0.9 0.9 1.0 2.6 108.3
fsharpx.queue 1.0 1.1 1.4 7.6 106.6
fsharpx.randomaccesslist 0.8 0.9 3.3 19.6 189.8
fsharpx.vector 0.8 0.9 3.3 19.7 189.1

Comments

1) Depending on the structure’s signature by invoking cons or conj using seq.fold.

2) Source data is an ascending ordered integer array, except where noted.

3) Note that repeatedly adding an element to an existing array does not scale.

4) (*) I had trouble getting any Deque benchmarks at scale 1M to complete in reasonable time and have yet to establish whether this is a problem with my benchmark infrastructure or the Deque implementation or a combination thereof.

Initialize structure

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.2 1.3
ms.f#.array — ofList 0.2 0.2 0.3 0.5 2.5
ms.f#.list — ofArray 0.2 0.2 0.2 0.7 12.7
ms.f#.list 0.0 0.0 0.0 0.0 0.0
fsharpx.deque 0.6 0.6 0.6 1.0 *
fsharpx.dlist 1.5 1.5 1.7 3.5 49.8
fsharpx.heap 4.1 4.2 5.7 20.9 235.4
fsharpx.lazylist — ofArray 0.3 0.3 0.3 0.3 0.3
fsharpx.queue 1.0 1.0 1.1 1.6 13.5
fsharpx.randomaccesslist 4.4 4.5 5.2 11.5 156.5
fsharpx.vector 3.0 3.1 3.6 8.1 69.3

Comments

1) Using the respective module’s ofSeq, or different function where indicated.

2) Source data is an ascending ordered integer array, except where noted.

3) Queue and Deque both support O(1) ofList which would load from a list in a fraction of a millisecond.

Peek and Dequeue until the structure is empty

  102 103 104 105 106
ms.f#.list 0.1 0.1 0.1 0.2 1.0
fsharpx.deque — tail 1.9 2.0 2.2 5.2 *
fsharpx.deque — initial 2.9 2.9 3.3 8.2 *
fsharpx.dlist 0.6 0.6 1.0 6.4 105.8
fsharpx.heap 0.5 0.6 0.7 1.9 13.5
fsharpx.lazylist 0.9 1.0 2.2 21.3 254.1
fsharpx.queue 0.5 0.5 0.9 1.8 48.2
fsharpx.randomaccesslist 0.9 1.0 2.1 13.6 108.9
fsharpx.vector 0.9 1.0 2.1 13.6 114.7

Comments

1) Inspects element with either head or last and recursively takes tail or initial, depending on structure signature.

Use IEnumerable to iterate through each element

  102 103 104 105 106
ms.f#.array 0.3 0.3 0.4 1.1 8.4
ms.f#.list 0.7 0.7 0.8 2.0 14.0
fsharpx.deque 2.2 2.3 2.6 5.5 *
fsharpx.dlist 1.7 1.8 3.3 22.1 214.1
fsharpx.heap 5.3 5.6 6.6 28.8 450.5
fsharpx.lazylist 3.1 3.2 4.4 23.0 278.3
fsharpx.queue 2.0 2.0 2.4 5.3 50.2
fsharpx.randomaccesslist 1.6 1.7 1.8 3.9 24.8
fsharpx.vector 1.7 1.7 1.9 3.9 26.2

Reverse

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.2 1.1
ms.f#.list 0.2 0.2 0.2 0.4 1.8
fsharpx.deque 0.0 0.0 0.0 0.0 *
fsharpx.heap 5.2 5.7 8.4 64.8 1097.1
fsharpx.queue 0.1 0.1 0.1 0.1 0.1
fsharpx.randomaccesslist 1.5 1.5 2.1 10.2 100.0
fsharpx.vector 1.4 1.4 2.0 7.7 97.4

Append

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.2 1.4
ms.f#.list 0.2 0.2 0.3 0.7 46.0
fsharpx.dlist 0.2 0.2 0.2 0.2 0.2
fsharpx.heap 0.4 0.4 0.4 0.4 0.4
fsharpx.lazylist 0.2 0.2 0.2 0.2 0.2

Comments

1) Using merge for the Heap structure.

Iterate by index

  102 103 104 105 106
ms.f#.array 0.4 0.4 0.4 0.5 1.4
fsharpx.randomaccesslist 0.4 0.4 0.5 2.2 18.5
fsharpx.vector 0.4 0.4 0.5 2.0 19.1

Random lookup (10,000)

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.1 0.1
fsharpx.randomaccesslist 0.1 0.1 0.1 0.1 0.1
fsharpx.vector 0.1 0.1 0.1 0.1 0.1

Random update (10,000)

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.1 0.2
fsharpx.randomaccesslist 2.1 2.7 4.2 10.1 17.0
fsharpx.vector 2.2 2.7 3.4 6.9 17.0

Implementation Notes

1) I borrowed the structural equality implementation from Vector for the other structures. Heap perhaps does not need to used Unckecked.equals, but I have not profiled that option to see whether it would actually improve performance. More attention to equality checks taking advantage of internal structure may prove to be somewhat more efficient.

2) The structural equality implementation puts an internal mutable reference value in each structure that gets updated at most once per lifetime. I don’t think this will impede multi-threading use of the structures, but I don’t know for sure either.

3) As noted above there may be issues with Deque at scales >>100K elements. Another Deque in the “experimental” DataStructures namespace may meet the needs of your application better.

One thought on “A Unified Collection of Purely Functional List-like Data Structures

  1. Pingback: F# Weekly #4, 2013 « Sergey Tihon's Blog

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>