A Unified Collection of Purely Functional List-like Data Structures

The FSharpx.Collections namespace now provides a collection of linear data structures deriving from the List signature. To emphasize the unity of the collection I implemented a standardized nomenclature expanding on the List value names. This is not without controversy. Structures like Queue are well-known in other (mostly imperative) languages, but I believe together these structures exhibit more similarities than differences, and bringing them all together in one F# collection is an opportunity to emphasize that logical unity.

My intent was to expand the List signature nomenclature with the naming standard favored by Okasaki, but “init” as the name for the inverse of “tail” would not do as this conflicts with a List module value. So this value is named “initial”. And I made one other change from Okasaki. In recognition of Steffen Forkmann’s F# implementation of the Vector structure from Clojure being the basis of two structures in this collection (Vector and RandomAccessList), I have opted to name the end-insertion function/member “conj” instead of “snoc”.

The List-like Immutable Data Structures

The following structures provide features perhaps available from List and Array, but not efficiently implemented and/or not in the right combination for a particular task, and the full composability and immutability you expect from purely functional data structures.

Deque (Double-ended queue) is an ordered linear structure implementing the signature of List (head, tail, cons) as well as the mirror-image Vector signature (last, initial, conj). “Head” inspects the first or left-most element in the structure, while “last” inspects the last or right-most element. Ordering is by insertion history.

DList is an ordered linear structure implementing the List signature (head, tail, cons), end-insertion (conj), and O(1) append. Ordering is by insertion history. DList is an implementation of John Hughes’ append list.

Heap is an ordered linear structure where the ordering is either ascending or descending. “Head” inspects the first element in the ordering, “tail” takes the remaining structure after head, and “insert” places elements within the ordering. PriorityQueue is available as an alternate interface.

LazyList is an ordered linear structure implementing the List signature (head, tail, cons), but unlike the other linear structures computation of elements is delayed, executed once on demand, and thereafter cached. Adapted from the PowerPack implementation with the List signature values available from within the type class.

Queue is an ordered linear data structure where elements are added at the end (right) and inspected and removed at the beginning (left). Ordering is by insertion history. The qualities of the Queue structure make elements first in, first out (fifo). “Head” inspects the first or left-most element in the structure, while “conj” inserts an element at the end, or right of the structure.

RandomAccessList is an ordered linear structure implementing the List signature (head, tail, cons), as well as inspection (lookup) and update (returning a new immutable instance) of any element in the structure by index. Ordering is by insertion history.

Vector is an ordered linear structure implementing the inverse of the List signature, (last, initial, conj) in place of (head, tail, cons). Indexed lookup or update (returning a new immutable instance of Vector) of any element is O(log32n) — just about O(1). Length is O(1). Ordering is by insertion history.

Comparing Performance

I recently posted a performance preview of the Queue data structure. Here are the performance benchmarks across the list-like structures, including List and Array from the Microsoft.FSharp.Collections namespace for comparison.

Times are milliseconds on a 2.2GHz 4GB dual core 64-bit Windows 7 machine. Orders of magnitude represent either the beginning or resulting number of elements in the structure. Milliseconds is derived by dividing ticks by 10,000. More on the benchmarking methodology can be found here. The data structure benchmark application can be found here.

Add elements to empty structure

  102 103 104 105 106
ms.f#.array 0.8 1.8 100.9 11771.4 n/a
ms.f#.array — list 0.3 1.0 69.5 n/a n/a
ms.f#.list 0.4 0.4 0.4 1.0 13.8
ms.f#.list — list 0.7 0.7 0.9 2.3 45.3
fsharpx.deque — conj 0.3 0.3 0.5 4.7 *
fsharpx.deque — cons 0.3 0.3 0.5 4.7 *
fsharpx.dlist — conj 0.7 0.7 1.0 7.7 153.0
fsharpx.dlist — cons 0.7 0.7 1.0 6.4 118.4
fsharpx.heap 3.2 3.3 5.0 22.5 254.7
fsharpx.lazylist 0.9 0.9 1.0 2.6 108.3
fsharpx.queue 1.0 1.1 1.4 7.6 106.6
fsharpx.randomaccesslist 0.8 0.9 3.3 19.6 189.8
fsharpx.vector 0.8 0.9 3.3 19.7 189.1

Comments

1) Depending on the structure’s signature by invoking cons or conj using seq.fold.

2) Source data is an ascending ordered integer array, except where noted.

3) Note that repeatedly adding an element to an existing array does not scale.

4) (*) I had trouble getting any Deque benchmarks at scale 1M to complete in reasonable time and have yet to establish whether this is a problem with my benchmark infrastructure or the Deque implementation or a combination thereof.

Initialize structure

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.2 1.3
ms.f#.array — ofList 0.2 0.2 0.3 0.5 2.5
ms.f#.list — ofArray 0.2 0.2 0.2 0.7 12.7
ms.f#.list 0.0 0.0 0.0 0.0 0.0
fsharpx.deque 0.6 0.6 0.6 1.0 *
fsharpx.dlist 1.5 1.5 1.7 3.5 49.8
fsharpx.heap 4.1 4.2 5.7 20.9 235.4
fsharpx.lazylist — ofArray 0.3 0.3 0.3 0.3 0.3
fsharpx.queue 1.0 1.0 1.1 1.6 13.5
fsharpx.randomaccesslist 4.4 4.5 5.2 11.5 156.5
fsharpx.vector 3.0 3.1 3.6 8.1 69.3

Comments

1) Using the respective module’s ofSeq, or different function where indicated.

2) Source data is an ascending ordered integer array, except where noted.

3) Queue and Deque both support O(1) ofList which would load from a list in a fraction of a millisecond.

Peek and Dequeue until the structure is empty

  102 103 104 105 106
ms.f#.list 0.1 0.1 0.1 0.2 1.0
fsharpx.deque — tail 1.9 2.0 2.2 5.2 *
fsharpx.deque — initial 2.9 2.9 3.3 8.2 *
fsharpx.dlist 0.6 0.6 1.0 6.4 105.8
fsharpx.heap 0.5 0.6 0.7 1.9 13.5
fsharpx.lazylist 0.9 1.0 2.2 21.3 254.1
fsharpx.queue 0.5 0.5 0.9 1.8 48.2
fsharpx.randomaccesslist 0.9 1.0 2.1 13.6 108.9
fsharpx.vector 0.9 1.0 2.1 13.6 114.7

Comments

1) Inspects element with either head or last and recursively takes tail or initial, depending on structure signature.

Use IEnumerable to iterate through each element

  102 103 104 105 106
ms.f#.array 0.3 0.3 0.4 1.1 8.4
ms.f#.list 0.7 0.7 0.8 2.0 14.0
fsharpx.deque 2.2 2.3 2.6 5.5 *
fsharpx.dlist 1.7 1.8 3.3 22.1 214.1
fsharpx.heap 5.3 5.6 6.6 28.8 450.5
fsharpx.lazylist 3.1 3.2 4.4 23.0 278.3
fsharpx.queue 2.0 2.0 2.4 5.3 50.2
fsharpx.randomaccesslist 1.6 1.7 1.8 3.9 24.8
fsharpx.vector 1.7 1.7 1.9 3.9 26.2

Reverse

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.2 1.1
ms.f#.list 0.2 0.2 0.2 0.4 1.8
fsharpx.deque 0.0 0.0 0.0 0.0 *
fsharpx.heap 5.2 5.7 8.4 64.8 1097.1
fsharpx.queue 0.1 0.1 0.1 0.1 0.1
fsharpx.randomaccesslist 1.5 1.5 2.1 10.2 100.0
fsharpx.vector 1.4 1.4 2.0 7.7 97.4

Append

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.2 1.4
ms.f#.list 0.2 0.2 0.3 0.7 46.0
fsharpx.dlist 0.2 0.2 0.2 0.2 0.2
fsharpx.heap 0.4 0.4 0.4 0.4 0.4
fsharpx.lazylist 0.2 0.2 0.2 0.2 0.2

Comments

1) Using merge for the Heap structure.

Iterate by index

  102 103 104 105 106
ms.f#.array 0.4 0.4 0.4 0.5 1.4
fsharpx.randomaccesslist 0.4 0.4 0.5 2.2 18.5
fsharpx.vector 0.4 0.4 0.5 2.0 19.1

Random lookup (10,000)

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.1 0.1
fsharpx.randomaccesslist 0.1 0.1 0.1 0.1 0.1
fsharpx.vector 0.1 0.1 0.1 0.1 0.1

Random update (10,000)

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.1 0.2
fsharpx.randomaccesslist 2.1 2.7 4.2 10.1 17.0
fsharpx.vector 2.2 2.7 3.4 6.9 17.0

Implementation Notes

1) I borrowed the structural equality implementation from Vector for the other structures. Heap perhaps does not need to used Unckecked.equals, but I have not profiled that option to see whether it would actually improve performance. More attention to equality checks taking advantage of internal structure may prove to be somewhat more efficient.

2) The structural equality implementation puts an internal mutable reference value in each structure that gets updated at most once per lifetime. I don’t think this will impede multi-threading use of the structures, but I don’t know for sure either.

3) As noted above there may be issues with Deque at scales >>100K elements. Another Deque in the “experimental” DataStructures namespace may meet the needs of your application better.

Comparing F# Queues

There are some useful additions to the F# tool-belt coming up. Microsoft recently released a preview edition of immutable collections, and if all goes as planned I will release a suite of 6 purely functional linear data structures to FSharpx.Collections in just a couple of days.

Queue is a data structure that needs better representation in the functional world, and both collections make contributions, so this is an opportunity to do some performance comparisons of the queues available to F# programmers. The contenders are two queues from the .NET Framework, System.Collections.Generic.Queue and System.Collections.Concurrent.Queue, the upcoming FSharpx.Collections.Queue, and FSharpx.Datastructures.BatchedQueue, from which FSharpx.Collections.Queue was developed.

The first benchmark exercises something like “real world” usage of the queues with the following steps:

1) enqueue the number of elements indicated

2) peek and dequeue half that number

3) enqueue the same number to the resulting queue

4) peek and dequeue half the resulting size of that queue

5) enqueue one more time the original number

(Milliseconds on a 2.2GHz 4GB dual core 64-bit Windows 7 machine)

  102 103 104 105 106
sys.queue 0.8 0.8 1.2 5.2 45.7
sys.concurrentqueue 1.9 2.1 2.8 10.9 90.3
sys.immutablequeue 1.4 1.5 2.7 21.2 401.4
fsharpx.queue 1.5 1.6 2.7 20.8 404.3
fsharpx.batchedqueue 2.0 2.2 4.0 32.8 n/a

And then the obligatory “not so real world” benchmarks

Enqueue the indicated number of elements

  102 103 104 105 106
sys.queue 0.3 0.3 0.4 1.1 8.8
sys.concurrentqueue 0.4 0.5 0.6 1.7 12.8
sys.immutablequeue 0.5 0.5 0.8 4.0 91.6
fsharpx.queue 1.1 1.2 1.4 3.7 102.4
fsharpx.batchedqueue 1.2 1.2 1.5 7.8 n/a

Peek and Dequeue until the queue is empty

  102 103 104 105 106
sys.queue 0.2 0.2 0.2 0.7 6.2
sys.concurrentqueue 0.7 0.7 0.8 2.4 18.9
sys.immutablequeue 0.5 0.5 0.7 3.3 74.7
fsharpx.queue 0.6 0.6 0.7 1.3 26.8
fsharpx.batchedqueue 0.7 0.7 0.9 1.8 n/a

Initialize from a sequential source (an array unless otherwise indicated)

(The preview Immutable.Queue does not yet implement this capability.)

  102 103 104 105 106
sys.queue 0.3 0.3 0.4 1.2 9.6
sys.concurrentqueue 0.3 0.3 0.3 0.9 6.6
fsharpx.queue ofList 0.1 0.1 0.1 0.1 0.1
fsharpx.queue 0.7 0.7 0.7 1.2 13.1
fsharpx.batchedqueue 0.8 0.8 0.8 1.3 n/a

Use IEnumerable to iterate through each element

(queue of indicated size)

  102 103 104 105 106
sys.queue 0.1 0.1 0.2 0.7 6.9
sys.concurrentqueue 0.7 0.7 0.8 1.3 7.0
sys.immutablequeue 0.7 0.7 0.9 2.4 55.8
fsharpx.queue 2.1 2.1 2.4 4.7 27.6
fsharpx.batchedqueue 2.1 2.2 2.4 4.7 n/a

So which Queue is right for You?

That of course depends on your application. For raw speed the system generic queue is hard to beat. The concurrent queue compares well against generic in all categories in a single-threaded environment, so for raw speed it should be a contender in multi-threaded environments, unless composability and immutability is preferred.

Preview Immutable.Queue seems primarily intended for the C# world, but has adequate composability to make it suitable for F#. FSharpx.Collections.Queue performs just as well (sometimes better), and has a broader range of supporting functions including fold, foldback, reverse (usually O(1), worst case O(n)), ofSeq, and ofList.

Semantics and List-like Data Structures

List as a Semantic Element

The singly-linked list is the fundamental data structure of functional programming. Suspending concern with how it is implemented (forward-linked elements) and dealing with it strictly as an abstract construct of its host language, List is an extension of the programming language, which in turn brings semantic structure to data. Growing and shrinking linear data representations is so powerful List alone suffices for solving a vast number of problems.

Eventually we do need to concern ourselves with implementation. There is a class of purely functional List-like data structures that extend List, either substituting more efficient versions of existing functions or supplementing additional functions for working with linear data representations.

The supplemental List-like data structures derive from familiar structures in imperative programming, but once constructed as purely functional (immutable) data structures, thinking of them as algebraic mutations of List reveals semantic cohesion.

List’s core functionality consists of three functions. (These are the names used in F#, and for clarity I will adhere mostly to the naming standard favored by Okasaki for extension functions, rather than the more familiar function names from imperative programming.)

cons — returns List with new element added to left

head — returns left-most element

tail — returns List excluding left-most element

F# List also provides non-core functionality (again, disregarding implementation you could do without these).

rev — returns reversed List

append — returns List of first List elements followed by second List elements

length — returns count of elements in the List

Extending Serial Representation

The extensions we will consider for our List-like vocabulary:

conj — returns List with new element added to right

initial — returns List excluding right-most element

last — returns right-most element

iLookup — returns element at index

iUpdate — returns List with element at index updated

It is perhaps surprising that List is so useful given that only the most recently added element is accessible! Significantly, you can readily see the extension functions either invert one of the core functions or provide short-cut access to an interior element. RandomAccessList provides indexed lookup and update.

RandomAccessList = List + iLookup + iUpdate

LazyList is one List-like data structure that most probably comes from a functional programming heritage. It is the only List-like structure reviewed here incorporating functionality the core List functions alone cannot replicate, namely lazy evaluation. Significantly in F# it already uses the List-like functional naming standard.

LazyList = ListLazy

Heap is frequently overlooked as being a List-like data structure. Even Okasaki reverted to using function names from imperative programming that obscure the close relationship to List. Heap is always sorted and provides a more efficient, O(log n), append (merge).

Heap = List + sorted + append

Difference List facilitates adding elements to the other end of the List, whether singly or another entire Difference List in O(1) time.

DList = List + conj + append

After introducing adding elements to both ends, now consider the fully symmetrical linear representation provided by a Double-ended Queue. Deque is the union of its tail and tail’s compliment, initial. This symmetry finds additional expression in an O(1) rev.

Deque = List + conj + last + initial + rev = initial U tail

Queue, a workhorse in imperative programming, is simply a List that can only add elements on the right, but still extracts them on the left with head.

Queue = List - cons + conj

While the algebra deriving Vector from List is the longest, it is simply an inverted RandomAccessList.

Vector = List - cons - head - tail + conj + last + initial + iLookup + iUpdate
Vector = RandomAccessList-1

Why the Cacophony of Names?

Perhaps perfect computer language implementations are an unrealizable ideal, but consistent nomenclature reveals semantics otherwise easily overlooked. Data structures from imperative programming take on a new character when made purely functional, and it is appropriate for them to give up part of their former identity (function names) in the functional realm.

All of the these F# Data Structures and more are available in the FSharpx.Collections and FSharpx.Datastructures namespaces here or from Nuget in FSharpx.Core.

UPDATE: Updated to naming standard adopted in FSharpx.Collections.