August | 2012 | craftyThoughts

Lately I’ve taken a closer look at Julien’s F# implementations of Chris Okasaki’s Purely Functional Data Structures¹. AltBinaryRandomAccessList² caught my eye because judging from its existing update function it looked pretty simple to implement a remove function. As you may have noticed in all the major collections of F# data structures –FSharp.Core.Collections, FSharpx.core.DataStructures, and PowerPack — remove is pretty rare. And both update and remove seem to be neglected stepchildren in the purely functional data structure world. (By update and remove I am referring to functions updating and removing any element by index.)

Let’s look at Okasaki’s “pseudo-canonical”³ update for list, implemented in F#.

let rec loop i y (l:'a list) = 
    match (i,y,l) with
    | i',y,[] -> raise (System.Exception("subscript"))
    | 0,y',x::xs -> y::xs
    | i',y,x::xs -> x::(loop (i'-1) y xs)

Do you see the problem? I’ll get back to it below.

How else might we implement update? We could just punt by converting to Array, updating, and back to list.

let punt i y (l:’a list) =
    let a = List.toArray l 
    a.[i] <- y
    List.ofArray a

And our last candidate is a hybrid, in which we progressively take the tail, collecting each head in an Array, until we pass the element for update. Then cons the new element and the elements from the Array.

let hybrid i y (l:'a list) =
    if (i = 0) then List.Cons (y, (List.tail l))
    else 
        let rec loop i' (front:'a array) back =
            match i' with
            | x when x < 0 -> front, (List.tail back)
            | x ->  
                Array.set front x (List.head back)
                loop (x-1) front (List.tail back)

        let front', back' = loop (i - 1) (Array.create i y) l

        let rec loop2 i' frontLen (front'':'a array) back'' =
            match i' with
            | x when x > frontLen -> back''
            | x -> loop2 (x + 1) frontLen front'' (front''.[x]::back'')

        loop2 0 ((Seq.length front') - 1) front' (y::back')

AltBinaryRandomAccessList is a structure with the same basic functions as List, but optimized for update. Here’s the type structure:

type AltBinRndAccList<'a> =
    | Nil
    | Zero of AltBinRndAccList<'a * 'a>
    | One of 'a * AltBinRndAccList<'a * 'a>

The element grouping represents the binary number of the count of elements, with least significant digit to the left⁴. This allows for a member update function which runs in O(log n) time. (The best we have seen so far is O(i), where i is the index of the target element, and punt runs in O(n)).

static member fupdate : ('a -> 'a) * int * AltBinRndAccList<'a> -> AltBinRndAccList<'a> = function
    | f, i, Nil -> raise Subscript
    | f, 0, One(x, ps) -> One(f x, ps)
    | f, i, One (x, ps) -> AltBinRndAccList.cons x (AltBinRndAccList.fupdate (f, i-1, Zero ps))
    | f, i, Zero ps ->
        let f' (x, y) = if i % 2= 0 then f x, y else x, f y
        Zero(AltBinRndAccList.fupdate(f', i/2, ps))

Now lets look at actual performance for our update alternatives⁵.

Size	10k Random Updates					One-time Worst Case
10²	PC	-		2.9	ms	Punt	-		0.2	ms
	Hybrid	1.4	X	4.0		PC	1.1	X	0.2
	Punt	1.5		4.5		Hybrid	4.1		0.8
	RndAcc	1.8		5.3		RndAcc	7.7		1.5

10³	RndAcc	-		8.2		Punt	-		0.2
	Hybrid	3.6		29.6		PC	1.1		0.2
	Punt	5.8		47.6		Hybrid	4.1		0.8
	PC	6.2		50.3		RndAcc	8.0		1.6

10⁴	RndAcc	-		11.5		Punt	-		0.3
	Hybrid	27.8		320.3		PC	1.3		0.4
	Punt	46.4		534.9		Hybrid	3.2		0.9
	PC	79.8		920.2		RndAcc	6.5		1.8

10⁵	RndAcc	-		14.8		Punt	-		1.0
	Hybrid	315.5		4.67	sec	Hybrid	1.5		1.5
	Punt	630.7		9.34		RndAcc	2.0		2.0
	PC	stack overflow

By way of comparison Array, which is not a purely functional structure, performs 10,000 updates in 0.1 millisecond at all scales, and really has no “worst case” single update. And notice because its function is not tail recursive Pseudo-Canonical eventually took a stack overflow. Also of interest, starting at the scale of 1,000 elements Punt outperforms Pseudo-Canonical in random updates, even though it is O(n) and the latter O(i). Time complexity does not tell the whole performance story.

So where does this leave remove? As it turns out my original idea to modify the update of AltBinaryRandomAccessList resulted with a function that was not tail recursive, but applying the hybrid technique to it resulted in a satisfactory algorithm⁶ and marginally improved performance, but like most every remove implemented in a purely functional list-like structure, the best time complexity you are going to get is O(i). (With a double-ended queue you can achieve ~O(i/2) .)

Here’s the comparative summary of remove performance on list-like structures. This time I include Array, where I copy the retained elements into a new structure one element shorter.

Size	10k Random Deletes					One-time Worst Case
10²	Array	-		1.4	ms	Array	-		0.1	ms
	PC	1.9	X	2.6		PC	2.7	X	0.2
	Hybrid	2.7		3.7		Punt	4.4		0.3
	Punt	3.9		5.3		Hybrid	10.5		0.7
	RndAcc	36.5		49.7		RndAcc	40.4		2.7

10³	Array	-		11.2		Array	-		0.1
	Hybrid	2.6		28.7		PC	2.7		0.2
	PC	4.3		48.3		Punt	4.3		0.3
	Punt	4.4		49.3		Hybrid	10.4		0.7
	RndAcc	40.1		448.5		RndAcc	41.1		2.9

10⁴	Array	-		107.9		Array	-		0.1
	Hybrid	3.0		321.5		PC	3.2		0.3
	Punt	5.5		589.9		Punt	4.5		0.4
	PC	8.3		898.8		Hybrid	10.0		0.8
	RndAcc	48.8		5.27	sec	RndAcc	45.8		3.8

10⁵	Array	-		1.43	sec	Array	-		0.2
	Hybrid	3.2		4.51		Punt	6.0		1.1
	Punt	7.7		11.04		Hybrid	8.1		1.5
	RndAcc	42.9		61.39		RndAcc	89.8		16.1
	PC	stack overflow

As you can see, remove is a viable function for moderate-sized data structures, but suffers from the limitation of its time complexity as you scale up. You could easily modify remove functions to process an ordered list or range of indexes and still have O(max i) performance.

One remaining question for purists: are the stepchildren still Purely Functional when they use mutation inside a function?

Notes

1 Okasaki, Purely Functional Data Structures, 1998

2 AltBinaryRandomAccessList

3 Okasaki, 1998.

4 See Okasaki, 1998 or Okasaki, Purely Functional Data Structures, 1996 for in depth discussion of numerical representations of data structures.

5 Times were taken on a not particularly fast dual core 64-bit system. Time multiples were calculated at a finer granularity, and so may not perfectly match the timings in milliseconds. AltBinaryRandomAccessList is abbreviated to “RndAcc”.

6 I hope to release it soon in an open source library.

Loading Data Structures

The Initial Data Performance tables rank data structures, originating data source, and data loading action from fastest to slowest for data sizes ranging from 1 to 10⁶. The column “TIME REL TO BEST” reports how many times longer a given combination takes compared to the top ranked.

Naturally you will be interested in specific comparisons. At the scale 10⁵ it is not surprising loading a List from another List of ascending integers is ranked the fastest for the “Init” action (List.ofList), but say your interest is in comparing performance of FSharpx DList to FSharpx PersistentVector when loading from an integer Array? Read down the chart until you find

FSharpx-DList

ArrayIntAsc

Init

1.0

96.1

and

FSharpx-PersistentVector

ArrayIntAsc

Init

1.0

199.4

The initial load performance of DList is 96.1 times as long as the top ranked, so divide Persistent Vector’s number by this to learn that it’s performance is 2.1 times slower than DList.

While the tables comparing performance across data structures will probably be your go to tables, there are also tables comparing performance within a data structure, one for comparing across different methods of loading data (there are some surprising results) as well as actual performance when scaling up data sizes.

Pathologies and other observations

Reasoning about any kind of software performance is notoriously hard. Modern CPU pipelining architectures and instruction and memory caches make it darn near impossible. And there are many considerations to take into account regarding the choice of language. You need to do your own profiling. These performance tables are no substitute, but meant to be a guide. Here are some interesting things that come to light with the performance tables:

Structures you would expect to display a more or less O(n) data loading performance seldom do. It’s much more common to display better performance up to some threshold (often 10⁴) and then perform worse than O(n).

The Power Pack library’s HashMultiMap structure loads significantly faster using the AddOne action (a For...do loop) than the new() constructor up to scale 10⁵ where performance becomes roughly the same.

FSharpx’s RealTimeQueue will stack overflow at scale 10⁵ on both the 32 and 64-bit systems I tested on.

FSharpx’s BootstrappedQueue scales remarkably well when loading from a List. (Of course if you looked at the code for this structure you would understand why.) And so the AddOne action scales badly in comparison to Init at scale 10⁵.

Not only is Array generally the fastest structure to initialize at all scales. It is often the fastest structure as the source for loading another structure.

Caveats

It’s important to remember the data load performance charts are the result of the performance interplay of the source and target data structures. If you are loading from I/O or some other “neutral” source consider comparing performance characteristics of Array loads, since Array seems to be consistently fast, and therefore somewhat neutral.

You will also see some strange blips at certain scales and for certain data and ordering characteristics. While I can’t vouch for every one, the oddities I looked at were consistently repeatable. I was still doing some development work while compiling these statistics. There is the possibility of bad data, and I eventually intend to re-compile all these stats. I deemed it more important to publish what I have and move forward with the next phases of my data structure performance survey.

Still to come

Still a lot of data structures to add to the data structure pages, and I still have to compile performance charactistics on common actions, like lookup, update, append, iterate. So I’ll have my head down working on this for a while. Please send comments, suggestions, and requests.

Here are some more .NET related performance benchmarks, also using the ratio comparison method.

craftyThoughts

Composability -> Expressiveness -> Correctness -> Testability -> Intention

Monthly Archives: August 2012

Purely Functional Stepchildren

Notes

Benchmarking F# Data Structures — Loading Data

Loading Data Structures

Pathologies and other observations

Caveats

Still to come