In the ten weeks since I began submitting pull requests contributing Functional Data Structures to the FSharpx open source project I’ve learned a lot, not just expanding my knowledge of the F# language, .NET Framework, and Functional Data Structures, but Git, GitHub, and perhaps most interesting of all, contributing to an open source project.
Steffen Forkmann and Mauricio Scheffer, two of the core FSharpx team, deserve special call-outs for helping me come up to speed. Steffen ported some of my work before I began doing Git pull requests, and they’ve both been very patient with my Git blunders and other mistakes. It took some time, but I’m starting to understand the somewhat anarchic governance structure to open source. With my background leading medium-sized production projects, I try to develop a big picture of the entire stack. I’m learning OS contributors tend to be much more focused on immediate issues. I think this is for good reason as commit history is crucial to the health and long-term benefit of the project. It works like a self-organizing system with big picture benefits as emergent phenomena.
I still can’t resist formulating a bigger picture framework for the FSharpx Data Structures sub-project. What follows is purely my opinion, and I hope FSharpx contributors and others will correct me where I err, and encourage me where I am right with their feedback.
FSharpx.DataStructures Namespace is Purely Functional
This namespace is for Purely Functional Data Structures. There is already one exception, TransientVector which shares the IVector interface with the purely functional PersistentVector.
Mutable structures should have their own namespace, which currently doesn’t appear to exist. There is a “Collections” folder, but its components are in the FSharpx namespace. It includes extension modules to functional and mutable structures as well as one functional structure, NonEmptyList.
Structure Type and Module
The structure’s type should exist directly in the DataStructures namespace, as well as a module with the same name as the type, but not the type under the module. There are a few structures in the latter situation with no non-breaking remedy, except perhaps duplicating the entire type at the namespace level. The type should have all the members that operate on an instantiated type. The module should have let
bindings to all the the type members as well as let bindings for all other “utility” functions on the type, like ofSeq
and empty
.
Understand the difference between a property and a method
Property Usage Guidelines One rule of thumb, if it is not O(1), it is not a property. (The inverse does not necessarily hold.)
Pattern Discriminator
Include any relevant pattern discriminators in the module. Hint: if the type implements TryUncons or TryUnsnoc you can implement pattern discriminators.
Try to Avoid Exceptions
If a structure’s method can throw an exception, include a TryMethodName method that returns option.
Necessary Members
Every data structure should have a length
member (even if length is O(n)) and an IsEmpty
member in the type and ofSeq
in the module.
Additional Members
Of course implement all the members central to the structure, but there can be some nice O(1) or O(log n) values just waiting to be implemented that perhaps were not directly addressed in the research paper because they were not interesting. A couple of examples: the internal structure of the type allows O(1) rev
or ofList
.
Necessary Interface
Implement IEnumerable in the type. It should sequence elements in the inherent order of the data structure. For instance if a structure implements the head/tail paradigm, IEnumerable should sequence elements in the order of unconsing the structure down to empty. Most every structure has some logical, if not physical, order to its elements when flattened. (I can’t think of an exception right now.)
Implementing ofSeq
and IEnumerable makes practically every Data Structure composable into every other, a very useful feature!
Type Interface
In theory data structures implementing the same paradigm, e.g. queue, should implement an interface that both restricts upcasting the interface to the original type and is a usable interface. Meeting both the restrictive and usable requirements has proven difficult. I haven’t given up on this project yet, just deferred it.
Hide unsafe underlying structures
When I first went to work on data structures in the FSharpx project, I noticed some of the structures exposing unsafe (in the sense that arbitrary use could lead to a dysfunctional structure) discriminated unions. I didn’t fully appreciate at the time how an OS project like FSharpx incrementally works toward ideal practices, and considered this an intentional practice, which I emulated.
As a result of Mauricio’s initiative to test for monoid law adherence, this [mal]practice has been exposed here and here.
This is a fair amount of clean-up work. Still, the exposed union does provide one more handle for the determined hacker to be inventive with, so perhaps the correct solution is to expose the union in the type and module in some manner that is clearly identified as unsafe. Or even collecting all of the unsafe unions in a single unsafe module. “Unsafe” or “unchecked” would be ideal names, except they are already C# key words.
Still thinking this over, and hope to get feedback on this.
Intellisense
Every public member and let binding should have intellisense documentation to make it user friendly and not require a literature search or inspection of the code to determine functionality. At the type level there should be intellisense documenting the purpose and use of the structure. Include URLs to additional literature where appropriate.
Data structures should give developers useful tools, not research projects.
Time Complexity
Adopting a practice from Haskell documentation, I put the time complexity for members and let bindings at the beginning of their intellisense documentation. I even went so far as to put O(1) on empty
and singleton
to accustom users to seeing the time complexity in the intellisense, even though strictly speaking time complexity for these bindings has no meaning.
Take time to evaluate the naming standard
While it’s a good practice to keep the name a structure is best known as, consider renaming properties and methods if it makes sense to fit in with F# paradigms. Use the FSharp.Collections and existing FSharpx.DataStructures namepaces as models, don’t just unthinkingly adopt the names from a research paper or the implementation in another language. Use ofList
and ofArray
not fromList
or fromArray
. I intentionally renamed the Heap members findMin
and deleteMin
to head
and tail
because I think that accurately conveys the head/tail paradigm like several other data structures (and I implemented Heaps as ascending or descending, so “min” is inappropriate). Stick with cons, snoc, head, tail, init, last
wherever possible.
IReadOnlyList and IReadOnlyDictionary
IReadOnlyCollection and IReadOnlyDictionary, new to .NET 4.5, would appear to be useful to implement, however backwards compatibility of FSharpx raises a problem. A discussion of this issue is here.
Should FSharpx take over the PowerPack Data Structures?
Discussion: here
There are related threads for migrating other functions from PowerPack, for instance here and here
I don’t know what the right answer is.
Conclusion
Data Structures make F# a richer multi-technology-stack language. I keep a running collection of the structures from Okasaki’s 1998 book not yet ported to FSharpx here. (There’s at least one or two covered in the book not in this collection.) Feel free to request expediting porting any or all, or work on the port yourself (I’m happy to provide any assistance I can). There’s lots of other useful structures still “in the wild”.