The Great New York F# Expedition of 2013

Halfway through the first day of our walking tour of New York, Paul tweets me while Sue and I are riding the Statten Island Ferry, can I extend my lightning talk to the N.Y. F# meetup on Tuesday evening? OK, it ended up making for a much better presentation. I spent the evening cutting down my Lambda Jam presentation to its essence, which I pretentiously titled “Semantically Coherent Functional Linear Data Structures”. The video is here This is the latest incarnation of the thoughts I’ve been trying to express since this blog post about the unity of linear purely functional data structures providing alternate composeable linear views and their capabilities.

I was followed by Rachel Reese’s excellent presentation on F# Agents.

NY Progressive F# Tutorials

Wendy and the Skills Matter crew put on a really nice conference For the NY Progressive F# Tutorials. This was my first experience with a Skills Matter conference, and it won’t be my last. Their reputation preceeds them, and now I can vouch they put on no-nonsense hands-on learning conferences for real practitioners. It helps that they get the best speakers. Don Syme’s keynote is captured on video, and the rest of the sessions on audio.

Don Syme:Keynote: F# in the Open Source World, Keynote: F# in the Open Source World
Don Syme : 18th Sep 2013
Don Syme:Calling and extending the F# compiler, Calling and extending the F# compiler
Don Syme and Tomas Petricek : 18th Sep 2013
Tomas Petricek:FCell: Numerical Development in Excel w/ F#, FCell: Numerical Development in Excel
Tomas Petricek and Adam Mlocek : 18th Sep 2013
Rachel  Reese:Try F# from Zero to Data Science, Try F# from Zero to Data Science
Rachel Reese and Phil Trelford : 18th Sep 2013
Chris Marinos:The F# Koans: An Interactive Way to Learn F# Through Testing, F# Koans: An Interactive Way to Learn
Chris Marinos : 18th Sep 2013
Miguel de Icaza:Keynote: F# Beyond Windows, Keynote: F# Beyond Windows
Miguel de Icaza : 18th Sep 2013
Richard Minerich:Graph Man Contest, Graph Man Contest
Richard Minerich and Paulmichael Blasucci : 18th Sep 2013
Phil Trelford:Machine Learning with F#, Machine Learning with F#
Phil Trelford and Mathias Brandewinder : 18th Sep 2013
Dmitry Mozorov:Code Quotations: Code-as-Data for F#, Code Quotations: Code-as-Data for F#
Dmitry Mozorov and Jack Pappas : 18th Sep 2013
Phil Trelford:Pac-Man Kata, Pac-Man Kata
Phil Trelford and Mathias Brandewinder : 18th Sep 2013

The finale of the conference was Tyler Smith mopping the floor with me in the Pac Man AI programming competition, a group photo,

NYC Progressive F# Tutorials 2013 group

and dinner.

NYC Progressive F# Tutorials 2013 dinner

The F# conversations lasted until late in the night, I met some more of the international F# set, and once again the F# community was friendly, inclusive, and interested in progressing everyone’s skills and the state of the art.

After the F#

I was kindly invited to see the keynote of the next day’s conference, “Agile Testing & BDD Exchange”, by Gojko Adzic. If you have any interest at all in testing or user experience, I highly recommend this keynote.

Then I was off to the simultaneous Global GameCraft event in the Dumbo Spot annex to Dumbo Loft hosted by Andrea and Phil, where Phil Trelford gave me a personal tutorial on TickSpec, and I could take the time to work up a draft of this article.

Great Lakes Functional Programming Conference and more

I want to give a plug for the upcoming Great Lakes Functional Programming Conference in Detroit, MI on November 9. If you don’t already know, there is a vibrant functional programming community in Detroit, and kudos to Onorio Catenacci for organizing this.

My next talk is coming up on October 3 in San Francisco to the Bay Area Clojure User Group, “F# for Functional Enthusiasts”. Check it out if you are in the area.

Finally, I just have to pass on this little gem which spilled out of this year’s Strange Loop (darn! missed it again this year).

Lambda Jam F# Bibliography

Bibliography for my talk “Functional Linear Data Structures in F#” at Lambda Jam 2013.

Slides

Sandboxed demo code and examples from talk

Keynote: The Value of Values

FSharpx.Collections

FSharpx.Collections.Experimental

The Fastest Functional Data Structures Come to F#

Introduction to Microsoft’s IL

Busting 4 Modern Hardware Myths – Are Memory, HDDs, And SSDs Really Random Access?

Telerik’s JustDecompile recommended for IL exploration. Comes with intellisense and plugins extending functionality.

Porting Clojure’s Persistent Data Structures to .NET, part 1

Porting Clojure’s Persistent Data Structures to .NET, part 2

Hash array mapped trie

Solid Collections documentation discussion

A Unified Collection of Purely Functional List-like Data Structures

Purely Functional Stepchildren

Semantics and List-like Data Structures

Comparing F# Queues

Double-ended Queues for F#

Markov chain

Simulating a Markov Chain with F# Sequences

F# data frame library

Breadth-First Numbering: Lessons from a Small Exercise in Algorithm Design

Chicago Lambda Jam: The Must-attend Functional Programming Event

The full Lambda Jam Chicago schedule is now available!

July 8-10, 2013

I’m exited to be a part of this great opportunity for idea sharing across functional language communities. In addition to keynote speakers (Joe Armstrong, Gerald Sussman, and David Nolen), every morning will have nine sessions in three concurrent session paths, but what distinguishes this conference is fully half of each day is devoted to active participation in practical application of a half dozen of today’s most influential functional languages. And I have it on good authority there are going to be ground-breaking hands-on F# sessions you do not want to miss.

Look at the great line-up:

Clojure

The Joy of Flying Robots with Clojure – Carin Meier

Monads and Macros – Chris Houser and Jonathan Claggett

Functional composition – Chris Ford

Lisp and Cancer – Ola Bini

Data, Visibility, and Abstraction – Stuart Sierra

Scala

Functional Async Without the Pain – Jim Powers

Journey to the Heart of the For-Yield – Kelsey Innis

Enabling Microservice Architectures with Scala – Kevin Scaldeferri

Functional I/O in Scala – Nilanjan Raychaudhuri

Erlang

Distributed Programming with Riak Core and Pipe – John Daily

Finite State Machines – Why the fear? – Mahesh Paolini-Subramanya

Addressing Network Congestion in Riak Clusters – Steve Vinoski

Let it Crash: Erlang Fault Tolerance – Tristan Sloughter

F#

Functional Mobile Applications in F# – Adam Granicz

Functional Linear Data Structures in F# – Jack Fox

Clarity of Intent: Three Features of F# Which Lead to Better Code – Paulmichael Blasucci

Haskell

Domain Specific Languages and Towers of Abstraction in Haskell – Gershom Bazerman

QuickCheck: A Silver Bullet for testing? – Joseph Wayne Norton

Simile-Free Monad Recipes – Aditya Siram

Others

Functional Reactive Programming in the Netflix API – Ben Christensen

Protocols, Functors and Type Classes – Creighton Kirkendall

Living in a Post-Functional World – Daniel Spiewak

Copious Data, the “Killer App” for Functional Programming – Dean Wampler

Semantics, clarity, and notation: the benefits of expressions over statements – Tracy Harms

Living in Big Data with Vector Functional Programming – Dave Thomas

Functional Coffeescript for Web UIs – Richard Feldman

Redex: Program Your Semantics – Robby Findler

If that’s not enough, every afternoon we roll up our sleeves with your
choice from 5 incredible workshops ($50/each) or an open jam.

Monday workshops

Try F# from Zero to Data Science – Rachel Reese

The Art of Several Interpreters, Quickly – Dan Friedman, Jason Hemann

Hands-on Intro to Haskell – Bartosz Milewski

Top-down TDD in Clojure – Brian Marick

The Seductions of Scala – Dean Wampler

Tuesday workshops

F# on the Web – Ryan Riley and Daniel Mohl

Program Transformations – William Byrd, Nada Amin

Uses Lenses, Folds and Traversals – Edward Kmett

Functional Web Development with Clojure – Clinton N. Driesbach

Building Applications in Erlang – Garrett Smith

Wednesday workshops

Installed to Productive in Julia – Leah Hanson

Macros! – Drew Colthorp

Compilers from Scratch – Daniel Feltey

Functional Web Applications with
Webmachine
– Sean Cribbs, Chris Meiklejohn

Introduction to Summingbird – Sam Ritchie

Come Join the Functional Programming Event

Registration for Lambda Jam Chicago is
now open. Tickets are $400 for regular admission and $50 per workshop.
Register now!

Sponsorships available!

If your company is interested in hiring functional programmers, please
consider sponsoring Lambda Jam – the
sponsorship prospectus is available.

The Fastest Functional Data Structures Come to F#

I wasn’t expecting it, but there have been some breakthroughs in available functional data structures for F#. I’m talking about speed and versatility, and as always, composability. Now available in FSharpx.Collections.Experimental (and on NuGet) is by far the fastest linear functional data structure (for what it does), FlatList. It’s been available as an internal type for some time as part of the OS F# compiler code. Now for the first time it’s available as a stand-alone type in a component library with intellisense documentation.

It’s implemented as a struct and uses an array as internal storage, so it is just blazing fast (as fast as an array) at what it does best:

Item get

ofSeq

IEnumerable

rev

…and almost all of the array module members (plus some).

What it’s not so fast at (like array) is append. So it’s best loaded from an IEnumerable object rather than built up one element at a time.

Try it out, and post any feedback to the FSharpx GitHub issues. After sufficient beta testing we will graduate it to the FSharpx.Collections namespace.

More to come

I don’t want to steal his thunder, but there is someone quietly working on some more functional data structures that I hope to see soon in FSharspx.

First there’s a performance improvement to the already fast FSharpx.Collections.Vector (actually a new implementation). The speed increases I’m talking about are in the range of 2:1 to 5:1. And a wider range of module function bindings. Very cool.

Next there’s a unique and very versatile functional structure that does about anything you would ever want from a linear structure. I already have some plans for this one.

I’ve already had my hands on these two. They are real. He’s also working on at least one non-linear functional structure I have not evaluated yet, but based on past performance I have high expectations.

S.F. F# User Group: Working with Functional Data Structures, Bibliography

Bibliography for my presentation, Working with Functional Data Structures, Practical F# Application, February 4 at San Francisco F# User Group.

Slides

Purely Functional Data Structures (1996)
Purely Functional Data Structures (1998)

Purely Functional Stepchildren
F# Data Structure Libraries in FSharpx
A Unified Collection of Purely Functional List-like Data Structures
Comparing F# Queues
Semantics and List-like Data Structures
FSharpx.Collections
FSharpx.Collections.Experimental
Double-ended Queues for F#

DS_Benchmark
Benchmarking F# Data Structures — Introduction
Benchmarking F# Data Structures — Loading Data

Eric Lippert’s Comma Quibbling
Nick Palladinos Comma Quibbling
GC Performance Counters
LazyList
Are functional languages inherently slow?
The benefits of purely functional data structures
Functional Pearls Red-Black Trees in a Functional Setting
F# vs Mathematica: red-black trees
Red-Black Tree Remove
Implimenting IComparable in F#
Catamorphism

F# Data Structure Libraries in FSharpx

Last week I blogged about supplementing the FSharpx library with a collection of purely function sequential data structures. This week the news is a restructuring of FSharpx providing for the first time a separate project for experimental data structures. I really like this project set-up as opposed to an experimental branch for accessing experimental structures at the same time as the structures in Core.Collections.

Submit your Data Structures to FSharpx

So let the experimenting begin! The experimental project is actually a copy of the old FSharpx.DataStructures namespace. That namespace has been marked “obsolete” going forward, so if you are using it, switch to FSharpx.Collections.Experimental, which is a separate NuGet package from FSharpx.Core.

Experimental is intended primarily for purely functional (immutable) structures, but feel free to add mutable structures of interest. Good candidates include F# ports of structures available in other functional languages and implementations from academic papers. The main criteria is the structure will be useful and interesting to others. Include some documenting comments and pointers to sources, if appropriate.

Do include tests with your submissions. Not only is that all the QA there is, they also serve as functionality samples.

Staging for the FSharpx.Core library

Collections.Experimental also serves as the staging area for promotion to the FSharpx.Collections and FSharpx.Collections.Mutable namespaces. There is higher acceptance criteria for pull requests here. See the core collections readme. Structures should generally have the look and feel of belonging to a professional library.

There are already several candidates in experimental (in my opinion) worthy of polishing for core collections. I have not taken the time to look closely at most of these, so please excuse me if some of my comments are “uninformed”:

BKTree

In addition to reviewing for core standards, this structure would benefit from review by someone experienced with practical application of distance measures, perhaps adding more distance measures. A quick search of recent F# blogs finds

Revisiting String Distances

F# on Algorithms – Wagner–Fischer algorithm

Edit Distance (EDIST) with F#

Levenshtein Distance and the Triangle Inequality

Metrics on Words

BottomUpMergeSort

Maybe. Needs polishing.

Zippers

There are a couple of zippers in experimental. I’ll leave it to someone who uses these more to determine what it takes to promote them (or not) to core.

IntMap

Needs review for function naming. (Should match naming standard set by MS.FSharp.Collections.Map where appropriate.) Remove some unnecessary module level functions. (E.g. ofArray, which only composes List.ofArray. Better not to hide this from users.) And hide behind an fsi file or entirely remove the “unsafe” functions.

RoseTree

Looks useful. Needs polishing.

TimeSeries

I don’t know about this one.

An exercise for the reader

I recently uploaded EagerRoseTree to experimental. (It’s “eager” because the first RoseTree in experimental implements LazyList for the children forest.) EagerRoseTree is unfinished in many ways.

For one thing it exposes an unsafe internal structure. A user could implement a dysfunctional tree. The children forest is a Vector. This allows for indexed lookups in the EagerRoseTree right out of the box, but updates to forest nodes will only return a new vector, not propagate up to returning a new tree.

Here’s a couple more possibilities:

1) Adapt a zipper to navigate in the tree.

2) Adapt the recently released FsCoreSerializer to serialize this, and other structures.

Upcoming talk on F# Data Structures

If you are in the area, come to my data structures talk at SF F# User Group Monday night, February 4. It’s a great and knowledgeable user group.

A Unified Collection of Purely Functional List-like Data Structures

The FSharpx.Collections namespace now provides a collection of linear data structures deriving from the List signature. To emphasize the unity of the collection I implemented a standardized nomenclature expanding on the List value names. This is not without controversy. Structures like Queue are well-known in other (mostly imperative) languages, but I believe together these structures exhibit more similarities than differences, and bringing them all together in one F# collection is an opportunity to emphasize that logical unity.

My intent was to expand the List signature nomenclature with the naming standard favored by Okasaki, but “init” as the name for the inverse of “tail” would not do as this conflicts with a List module value. So this value is named “initial”. And I made one other change from Okasaki. In recognition of Steffen Forkmann’s F# implementation of the Vector structure from Clojure being the basis of two structures in this collection (Vector and RandomAccessList), I have opted to name the end-insertion function/member “conj” instead of “snoc”.

The List-like Immutable Data Structures

The following structures provide features perhaps available from List and Array, but not efficiently implemented and/or not in the right combination for a particular task, and the full composability and immutability you expect from purely functional data structures.

Deque (Double-ended queue) is an ordered linear structure implementing the signature of List (head, tail, cons) as well as the mirror-image Vector signature (last, initial, conj). “Head” inspects the first or left-most element in the structure, while “last” inspects the last or right-most element. Ordering is by insertion history.

DList is an ordered linear structure implementing the List signature (head, tail, cons), end-insertion (conj), and O(1) append. Ordering is by insertion history. DList is an implementation of John Hughes’ append list.

Heap is an ordered linear structure where the ordering is either ascending or descending. “Head” inspects the first element in the ordering, “tail” takes the remaining structure after head, and “insert” places elements within the ordering. PriorityQueue is available as an alternate interface.

LazyList is an ordered linear structure implementing the List signature (head, tail, cons), but unlike the other linear structures computation of elements is delayed, executed once on demand, and thereafter cached. Adapted from the PowerPack implementation with the List signature values available from within the type class.

Queue is an ordered linear data structure where elements are added at the end (right) and inspected and removed at the beginning (left). Ordering is by insertion history. The qualities of the Queue structure make elements first in, first out (fifo). “Head” inspects the first or left-most element in the structure, while “conj” inserts an element at the end, or right of the structure.

RandomAccessList is an ordered linear structure implementing the List signature (head, tail, cons), as well as inspection (lookup) and update (returning a new immutable instance) of any element in the structure by index. Ordering is by insertion history.

Vector is an ordered linear structure implementing the inverse of the List signature, (last, initial, conj) in place of (head, tail, cons). Indexed lookup or update (returning a new immutable instance of Vector) of any element is O(log32n) — just about O(1). Length is O(1). Ordering is by insertion history.

Comparing Performance

I recently posted a performance preview of the Queue data structure. Here are the performance benchmarks across the list-like structures, including List and Array from the Microsoft.FSharp.Collections namespace for comparison.

Times are milliseconds on a 2.2GHz 4GB dual core 64-bit Windows 7 machine. Orders of magnitude represent either the beginning or resulting number of elements in the structure. Milliseconds is derived by dividing ticks by 10,000. More on the benchmarking methodology can be found here. The data structure benchmark application can be found here.

Add elements to empty structure

  102 103 104 105 106
ms.f#.array 0.8 1.8 100.9 11771.4 n/a
ms.f#.array — list 0.3 1.0 69.5 n/a n/a
ms.f#.list 0.4 0.4 0.4 1.0 13.8
ms.f#.list — list 0.7 0.7 0.9 2.3 45.3
fsharpx.deque — conj 0.3 0.3 0.5 4.7 *
fsharpx.deque — cons 0.3 0.3 0.5 4.7 *
fsharpx.dlist — conj 0.7 0.7 1.0 7.7 153.0
fsharpx.dlist — cons 0.7 0.7 1.0 6.4 118.4
fsharpx.heap 3.2 3.3 5.0 22.5 254.7
fsharpx.lazylist 0.9 0.9 1.0 2.6 108.3
fsharpx.queue 1.0 1.1 1.4 7.6 106.6
fsharpx.randomaccesslist 0.8 0.9 3.3 19.6 189.8
fsharpx.vector 0.8 0.9 3.3 19.7 189.1

Comments

1) Depending on the structure’s signature by invoking cons or conj using seq.fold.

2) Source data is an ascending ordered integer array, except where noted.

3) Note that repeatedly adding an element to an existing array does not scale.

4) (*) I had trouble getting any Deque benchmarks at scale 1M to complete in reasonable time and have yet to establish whether this is a problem with my benchmark infrastructure or the Deque implementation or a combination thereof.

Initialize structure

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.2 1.3
ms.f#.array — ofList 0.2 0.2 0.3 0.5 2.5
ms.f#.list — ofArray 0.2 0.2 0.2 0.7 12.7
ms.f#.list 0.0 0.0 0.0 0.0 0.0
fsharpx.deque 0.6 0.6 0.6 1.0 *
fsharpx.dlist 1.5 1.5 1.7 3.5 49.8
fsharpx.heap 4.1 4.2 5.7 20.9 235.4
fsharpx.lazylist — ofArray 0.3 0.3 0.3 0.3 0.3
fsharpx.queue 1.0 1.0 1.1 1.6 13.5
fsharpx.randomaccesslist 4.4 4.5 5.2 11.5 156.5
fsharpx.vector 3.0 3.1 3.6 8.1 69.3

Comments

1) Using the respective module’s ofSeq, or different function where indicated.

2) Source data is an ascending ordered integer array, except where noted.

3) Queue and Deque both support O(1) ofList which would load from a list in a fraction of a millisecond.

Peek and Dequeue until the structure is empty

  102 103 104 105 106
ms.f#.list 0.1 0.1 0.1 0.2 1.0
fsharpx.deque — tail 1.9 2.0 2.2 5.2 *
fsharpx.deque — initial 2.9 2.9 3.3 8.2 *
fsharpx.dlist 0.6 0.6 1.0 6.4 105.8
fsharpx.heap 0.5 0.6 0.7 1.9 13.5
fsharpx.lazylist 0.9 1.0 2.2 21.3 254.1
fsharpx.queue 0.5 0.5 0.9 1.8 48.2
fsharpx.randomaccesslist 0.9 1.0 2.1 13.6 108.9
fsharpx.vector 0.9 1.0 2.1 13.6 114.7

Comments

1) Inspects element with either head or last and recursively takes tail or initial, depending on structure signature.

Use IEnumerable to iterate through each element

  102 103 104 105 106
ms.f#.array 0.3 0.3 0.4 1.1 8.4
ms.f#.list 0.7 0.7 0.8 2.0 14.0
fsharpx.deque 2.2 2.3 2.6 5.5 *
fsharpx.dlist 1.7 1.8 3.3 22.1 214.1
fsharpx.heap 5.3 5.6 6.6 28.8 450.5
fsharpx.lazylist 3.1 3.2 4.4 23.0 278.3
fsharpx.queue 2.0 2.0 2.4 5.3 50.2
fsharpx.randomaccesslist 1.6 1.7 1.8 3.9 24.8
fsharpx.vector 1.7 1.7 1.9 3.9 26.2

Reverse

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.2 1.1
ms.f#.list 0.2 0.2 0.2 0.4 1.8
fsharpx.deque 0.0 0.0 0.0 0.0 *
fsharpx.heap 5.2 5.7 8.4 64.8 1097.1
fsharpx.queue 0.1 0.1 0.1 0.1 0.1
fsharpx.randomaccesslist 1.5 1.5 2.1 10.2 100.0
fsharpx.vector 1.4 1.4 2.0 7.7 97.4

Append

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.2 1.4
ms.f#.list 0.2 0.2 0.3 0.7 46.0
fsharpx.dlist 0.2 0.2 0.2 0.2 0.2
fsharpx.heap 0.4 0.4 0.4 0.4 0.4
fsharpx.lazylist 0.2 0.2 0.2 0.2 0.2

Comments

1) Using merge for the Heap structure.

Iterate by index

  102 103 104 105 106
ms.f#.array 0.4 0.4 0.4 0.5 1.4
fsharpx.randomaccesslist 0.4 0.4 0.5 2.2 18.5
fsharpx.vector 0.4 0.4 0.5 2.0 19.1

Random lookup (10,000)

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.1 0.1
fsharpx.randomaccesslist 0.1 0.1 0.1 0.1 0.1
fsharpx.vector 0.1 0.1 0.1 0.1 0.1

Random update (10,000)

  102 103 104 105 106
ms.f#.array 0.1 0.1 0.1 0.1 0.2
fsharpx.randomaccesslist 2.1 2.7 4.2 10.1 17.0
fsharpx.vector 2.2 2.7 3.4 6.9 17.0

Implementation Notes

1) I borrowed the structural equality implementation from Vector for the other structures. Heap perhaps does not need to used Unckecked.equals, but I have not profiled that option to see whether it would actually improve performance. More attention to equality checks taking advantage of internal structure may prove to be somewhat more efficient.

2) The structural equality implementation puts an internal mutable reference value in each structure that gets updated at most once per lifetime. I don’t think this will impede multi-threading use of the structures, but I don’t know for sure either.

3) As noted above there may be issues with Deque at scales >>100K elements. Another Deque in the “experimental” DataStructures namespace may meet the needs of your application better.

Comparing F# Queues

There are some useful additions to the F# tool-belt coming up. Microsoft recently released a preview edition of immutable collections, and if all goes as planned I will release a suite of 6 purely functional linear data structures to FSharpx.Collections in just a couple of days.

Queue is a data structure that needs better representation in the functional world, and both collections make contributions, so this is an opportunity to do some performance comparisons of the queues available to F# programmers. The contenders are two queues from the .NET Framework, System.Collections.Generic.Queue and System.Collections.Concurrent.Queue, the upcoming FSharpx.Collections.Queue, and FSharpx.Datastructures.BatchedQueue, from which FSharpx.Collections.Queue was developed.

The first benchmark exercises something like “real world” usage of the queues with the following steps:

1) enqueue the number of elements indicated

2) peek and dequeue half that number

3) enqueue the same number to the resulting queue

4) peek and dequeue half the resulting size of that queue

5) enqueue one more time the original number

(Milliseconds on a 2.2GHz 4GB dual core 64-bit Windows 7 machine)

  102 103 104 105 106
sys.queue 0.8 0.8 1.2 5.2 45.7
sys.concurrentqueue 1.9 2.1 2.8 10.9 90.3
sys.immutablequeue 1.4 1.5 2.7 21.2 401.4
fsharpx.queue 1.5 1.6 2.7 20.8 404.3
fsharpx.batchedqueue 2.0 2.2 4.0 32.8 n/a

And then the obligatory “not so real world” benchmarks

Enqueue the indicated number of elements

  102 103 104 105 106
sys.queue 0.3 0.3 0.4 1.1 8.8
sys.concurrentqueue 0.4 0.5 0.6 1.7 12.8
sys.immutablequeue 0.5 0.5 0.8 4.0 91.6
fsharpx.queue 1.1 1.2 1.4 3.7 102.4
fsharpx.batchedqueue 1.2 1.2 1.5 7.8 n/a

Peek and Dequeue until the queue is empty

  102 103 104 105 106
sys.queue 0.2 0.2 0.2 0.7 6.2
sys.concurrentqueue 0.7 0.7 0.8 2.4 18.9
sys.immutablequeue 0.5 0.5 0.7 3.3 74.7
fsharpx.queue 0.6 0.6 0.7 1.3 26.8
fsharpx.batchedqueue 0.7 0.7 0.9 1.8 n/a

Initialize from a sequential source (an array unless otherwise indicated)

(The preview Immutable.Queue does not yet implement this capability.)

  102 103 104 105 106
sys.queue 0.3 0.3 0.4 1.2 9.6
sys.concurrentqueue 0.3 0.3 0.3 0.9 6.6
fsharpx.queue ofList 0.1 0.1 0.1 0.1 0.1
fsharpx.queue 0.7 0.7 0.7 1.2 13.1
fsharpx.batchedqueue 0.8 0.8 0.8 1.3 n/a

Use IEnumerable to iterate through each element

(queue of indicated size)

  102 103 104 105 106
sys.queue 0.1 0.1 0.2 0.7 6.9
sys.concurrentqueue 0.7 0.7 0.8 1.3 7.0
sys.immutablequeue 0.7 0.7 0.9 2.4 55.8
fsharpx.queue 2.1 2.1 2.4 4.7 27.6
fsharpx.batchedqueue 2.1 2.2 2.4 4.7 n/a

So which Queue is right for You?

That of course depends on your application. For raw speed the system generic queue is hard to beat. The concurrent queue compares well against generic in all categories in a single-threaded environment, so for raw speed it should be a contender in multi-threaded environments, unless composability and immutability is preferred.

Preview Immutable.Queue seems primarily intended for the C# world, but has adequate composability to make it suitable for F#. FSharpx.Collections.Queue performs just as well (sometimes better), and has a broader range of supporting functions including fold, foldback, reverse (usually O(1), worst case O(n)), ofSeq, and ofList.

Semantics and List-like Data Structures

List as a Semantic Element

The singly-linked list is the fundamental data structure of functional programming. Suspending concern with how it is implemented (forward-linked elements) and dealing with it strictly as an abstract construct of its host language, List is an extension of the programming language, which in turn brings semantic structure to data. Growing and shrinking linear data representations is so powerful List alone suffices for solving a vast number of problems.

Eventually we do need to concern ourselves with implementation. There is a class of purely functional List-like data structures that extend List, either substituting more efficient versions of existing functions or supplementing additional functions for working with linear data representations.

The supplemental List-like data structures derive from familiar structures in imperative programming, but once constructed as purely functional (immutable) data structures, thinking of them as algebraic mutations of List reveals semantic cohesion.

List’s core functionality consists of three functions. (These are the names used in F#, and for clarity I will adhere mostly to the naming standard favored by Okasaki for extension functions, rather than the more familiar function names from imperative programming.)

cons — returns List with new element added to left

head — returns left-most element

tail — returns List excluding left-most element

F# List also provides non-core functionality (again, disregarding implementation you could do without these).

rev — returns reversed List

append — returns List of first List elements followed by second List elements

length — returns count of elements in the List

Extending Serial Representation

The extensions we will consider for our List-like vocabulary:

conj — returns List with new element added to right

initial — returns List excluding right-most element

last — returns right-most element

iLookup — returns element at index

iUpdate — returns List with element at index updated

It is perhaps surprising that List is so useful given that only the most recently added element is accessible! Significantly, you can readily see the extension functions either invert one of the core functions or provide short-cut access to an interior element. RandomAccessList provides indexed lookup and update.

RandomAccessList = List + iLookup + iUpdate

LazyList is one List-like data structure that most probably comes from a functional programming heritage. It is the only List-like structure reviewed here incorporating functionality the core List functions alone cannot replicate, namely lazy evaluation. Significantly in F# it already uses the List-like functional naming standard.

LazyList = ListLazy

Heap is frequently overlooked as being a List-like data structure. Even Okasaki reverted to using function names from imperative programming that obscure the close relationship to List. Heap is always sorted and provides a more efficient, O(log n), append (merge).

Heap = List + sorted + append

Difference List facilitates adding elements to the other end of the List, whether singly or another entire Difference List in O(1) time.

DList = List + conj + append

After introducing adding elements to both ends, now consider the fully symmetrical linear representation provided by a Double-ended Queue. Deque is the union of its tail and tail’s compliment, initial. This symmetry finds additional expression in an O(1) rev.

Deque = List + conj + last + initial + rev = initial U tail

Queue, a workhorse in imperative programming, is simply a List that can only add elements on the right, but still extracts them on the left with head.

Queue = List - cons + conj

While the algebra deriving Vector from List is the longest, it is simply an inverted RandomAccessList.

Vector = List - cons - head - tail + conj + last + initial + iLookup + iUpdate
Vector = RandomAccessList-1

Why the Cacophony of Names?

Perhaps perfect computer language implementations are an unrealizable ideal, but consistent nomenclature reveals semantics otherwise easily overlooked. Data structures from imperative programming take on a new character when made purely functional, and it is appropriate for them to give up part of their former identity (function names) in the functional realm.

All of the these F# Data Structures and more are available in the FSharpx.Collections and FSharpx.Datastructures namespaces here or from Nuget in FSharpx.Core.

UPDATE: Updated to naming standard adopted in FSharpx.Collections.

FSharpx Data Structures Discussion

In the ten weeks since I began submitting pull requests contributing Functional Data Structures to the FSharpx open source project I’ve learned a lot, not just expanding my knowledge of the F# language, .NET Framework, and Functional Data Structures, but Git, GitHub, and perhaps most interesting of all, contributing to an open source project.

Steffen Forkmann and Mauricio Scheffer, two of the core FSharpx team, deserve special call-outs for helping me come up to speed. Steffen ported some of my work before I began doing Git pull requests, and they’ve both been very patient with my Git blunders and other mistakes. It took some time, but I’m starting to understand the somewhat anarchic governance structure to open source. With my background leading medium-sized production projects, I try to develop a big picture of the entire stack. I’m learning OS contributors tend to be much more focused on immediate issues. I think this is for good reason as commit history is crucial to the health and long-term benefit of the project. It works like a self-organizing system with big picture benefits as emergent phenomena.

I still can’t resist formulating a bigger picture framework for the FSharpx Data Structures sub-project. What follows is purely my opinion, and I hope FSharpx contributors and others will correct me where I err, and encourage me where I am right with their feedback.

FSharpx.DataStructures Namespace is Purely Functional

This namespace is for Purely Functional Data Structures. There is already one exception, TransientVector which shares the IVector interface with the purely functional PersistentVector.

Mutable structures should have their own namespace, which currently doesn’t appear to exist. There is a “Collections” folder, but its components are in the FSharpx namespace. It includes extension modules to functional and mutable structures as well as one functional structure, NonEmptyList.

Structure Type and Module

The structure’s type should exist directly in the DataStructures namespace, as well as a module with the same name as the type, but not the type under the module. There are a few structures in the latter situation with no non-breaking remedy, except perhaps duplicating the entire type at the namespace level. The type should have all the members that operate on an instantiated type. The module should have let bindings to all the the type members as well as let bindings for all other “utility” functions on the type, like ofSeq and empty.

Understand the difference between a property and a method

Property Usage Guidelines One rule of thumb, if it is not O(1), it is not a property. (The inverse does not necessarily hold.)

Pattern Discriminator

Include any relevant pattern discriminators in the module. Hint: if the type implements TryUncons or TryUnsnoc you can implement pattern discriminators.

Try to Avoid Exceptions

If a structure’s method can throw an exception, include a TryMethodName method that returns option.

Necessary Members

Every data structure should have a length member (even if length is O(n)) and an IsEmpty member in the type and ofSeq in the module.

Additional Members

Of course implement all the members central to the structure, but there can be some nice O(1) or O(log n) values just waiting to be implemented that perhaps were not directly addressed in the research paper because they were not interesting. A couple of examples: the internal structure of the type allows O(1) rev or ofList.

Necessary Interface

Implement IEnumerable in the type. It should sequence elements in the inherent order of the data structure. For instance if a structure implements the head/tail paradigm, IEnumerable should sequence elements in the order of unconsing the structure down to empty. Most every structure has some logical, if not physical, order to its elements when flattened. (I can’t think of an exception right now.)

Implementing ofSeq and IEnumerable makes practically every Data Structure composable into every other, a very useful feature!

Type Interface

In theory data structures implementing the same paradigm, e.g. queue, should implement an interface that both restricts upcasting the interface to the original type and is a usable interface. Meeting both the restrictive and usable requirements has proven difficult. I haven’t given up on this project yet, just deferred it.

Hide unsafe underlying structures

When I first went to work on data structures in the FSharpx project, I noticed some of the structures exposing unsafe (in the sense that arbitrary use could lead to a dysfunctional structure) discriminated unions. I didn’t fully appreciate at the time how an OS project like FSharpx incrementally works toward ideal practices, and considered this an intentional practice, which I emulated.

As a result of Mauricio’s initiative to test for monoid law adherence, this [mal]practice has been exposed here and here.

This is a fair amount of clean-up work. Still, the exposed union does provide one more handle for the determined hacker to be inventive with, so perhaps the correct solution is to expose the union in the type and module in some manner that is clearly identified as unsafe. Or even collecting all of the unsafe unions in a single unsafe module. “Unsafe” or “unchecked” would be ideal names, except they are already C# key words.

Still thinking this over, and hope to get feedback on this.

Intellisense

Every public member and let binding should have intellisense documentation to make it user friendly and not require a literature search or inspection of the code to determine functionality. At the type level there should be intellisense documenting the purpose and use of the structure. Include URLs to additional literature where appropriate.

Data structures should give developers useful tools, not research projects.

Time Complexity

Adopting a practice from Haskell documentation, I put the time complexity for members and let bindings at the beginning of their intellisense documentation. I even went so far as to put O(1) on empty and singleton to accustom users to seeing the time complexity in the intellisense, even though strictly speaking time complexity for these bindings has no meaning.

Take time to evaluate the naming standard

While it’s a good practice to keep the name a structure is best known as, consider renaming properties and methods if it makes sense to fit in with F# paradigms. Use the FSharp.Collections and existing FSharpx.DataStructures namepaces as models, don’t just unthinkingly adopt the names from a research paper or the implementation in another language. Use ofList and ofArray not fromList or fromArray. I intentionally renamed the Heap members findMin and deleteMin to head and tail because I think that accurately conveys the head/tail paradigm like several other data structures (and I implemented Heaps as ascending or descending, so “min” is inappropriate). Stick with cons, snoc, head, tail, init, last wherever possible.

IReadOnlyList and IReadOnlyDictionary

IReadOnlyCollection and IReadOnlyDictionary, new to .NET 4.5, would appear to be useful to implement, however backwards compatibility of FSharpx raises a problem. A discussion of this issue is here.

Should FSharpx take over the PowerPack Data Structures?

Discussion: here

There are related threads for migrating other functions from PowerPack, for instance here and here

I don’t know what the right answer is.

Conclusion

Data Structures make F# a richer multi-technology-stack language. I keep a running collection of the structures from Okasaki’s 1998 book not yet ported to FSharpx here. (There’s at least one or two covered in the book not in this collection.) Feel free to request expediting porting any or all, or work on the port yourself (I’m happy to provide any assistance I can). There’s lots of other useful structures still “in the wild”.