One thing I consistently include in preparing functional data structures for the FSharpx.DataStructures library is the IEnumerable interface, which facilitates function composition by seamlessly flattening structures. Using Brian McNamara’s indispensable series on catamorphisms as a guide I set about to flatten the BinomialHeap.
Ignoring for now isMaximalist : bool
we see a recursive structure of
list -> tuple -> list -> tuple -> etc.
and we want to flatten the elements 'a
to 'a list
. A classic case for a continuation passing style (CPS) fold.
This code shows a first attempt at hard-coding the flatten we want to accomplish. There are only four patterns to account for, so reasoning about the steps to take is simple. The fifth “can’t get here” pattern is purely to prevent a warning message. (The compiler doesn’t see there are only four paths.)
This code does what we want, producing an 'a list
of all the heap’s elements, but the continuation on the last pattern is appending lists, an inefficient procedure we want to avoid…but how?
Let’s see if we gain anything by abstracting out the flatten logic to make this a true generic fold.
So far I have been treating this problem as a classic example from part three of Brian’s series. In abstracting a CPS fold over a discriminated union, you end-up with one continuation function for each case in the union. Our data type is not a four-case discriminated union, but I have been treating the number of match patterns as a proxy for a discriminated union. (The second and third patterns take the same function. That should be a hint there is something wrong about this approach.)
Ultimately abstracting over (a proxy for) a discriminated union has been a dead-end. We do have an abstracted fold method using CPS, but there seems to be no way to get rid of that annoying list append.
The correct approach is through the tree CPS method Brian teaches us in part two. Except for the name “BinomialTree” it is not immediately obvious how a recursive structure like
list -> tuple -> list -> tuple -> etc.
is any kind of tree, let alone a binary tree. Only when we write out the pattern
Node(_, a, h')::tl
do we see the binary tree:
(‘a element to flatten) | ||||
h’ | tl | |||
(list within the element’s tuple) | (tail of list of which element is head) |
So the tree structure has been apparent from the time we wrote out the patterns for our match algorithm, and we know those patterns represent the complete set when iterating through our heap, but to accomplish a binary tree CPS fold you need a “null” pattern (a pattern which does not contribute data to the fold operation) at the bottom of each continuation in order to thread through the identity function. This is why you see so many tree examples in functional languages ending in “leaf”, a seemingly useless end of branch marker carrying no data.
There is no “leaf” in our data structure, but lets go through the motions of implementing the classic tree CPS fold, and maybe that will help us figure out something.
And then a miracle occurs
Don’t believe in software miracles, but all of a sudden our newest generation of the fold starts failing with “can’t get here”. We are now passing something new through our loop: empty list. It turns out there was a “null” pattern all along. Change
| _ -> failwith "can't get here"
to
| _ -> cont leafV
and we have a working CPS fold on the BinomialHeap. (I’m using Brian’s naming standard for the fold parameters.)
Let me break apart the inOrder call to foldHeap to make the fold mechanism more explicit:
A heap sidebar
The code is now perfectly suited for exploring CPS execution. The inOrder
member we have been evolving does an in order tree traversal of the heap’s internal data representation, which in the case of heaps is entirely meaningless. We only care that a complete traversal take place. We will sort the resulting list to get it into proper head/tail order.
You don’t believe the internal order is meaningless? Here’s the internal structure we will walk through:
Node (0, "b", [])
Node (1, "a", [Node (0, "c", [])])
In this case the isMaximalist : bool
I told you to ignore above is set to false, so the resulting list should be
["a"; "b"; "c"]
but not as result of the traversal, only after we sort the output list of the fold ourselves. The in order traversal results in
["b"; "c"; "a"]
Now back to our story
The step four code makes it easier to follow the logic and clearly shows how the empty lists satisfy the null pattern. Let’s walk through it.
current Node | executing code | ||||||
---|---|---|---|---|---|---|---|
cont = | |||||||
(fun x -> x) | |||||||
(0, “b”, []) | Node(_, a, [])::tl -> | loop [] (fun lacc -> | |||||
(head of top list) | |||||||
cont = | |||||||
(fun lacc -> | |||||||
loop [] (fun racc -> | |||||||
(nodeF a lacc racc) |> (fun x -> x)) | |||||||
cont leafV | |||||||
The remaining in order tree traversal builds-out increasingly nested continuations. | |||||||
Node(_, a, [])::tl -> | loop [] (fun lacc -> | ||||||
loop [] (fun racc -> | |||||||
(1, “a”, …) | Node(_, a, h’)::[] -> | loop h’ (fun lacc -> | |||||
(head of top right list) | |||||||
(0, “c”, []) | Node(_, a, [])::[] -> | loop [] (fun lacc -> | |||||
cont leafV | |||||||
Node(_, a, [])::tl -> | loop [] (fun lacc -> | ||||||
loop [] (fun racc -> | |||||||
cont leafV | |||||||
(1, “a”, …) | Node(_, a, h’)::[] -> | loop h’ (fun lacc -> | |||||
loop [] (fun racc -> | |||||||
cont leafV |
Swapping in the nodeF
parameter and substituting its parameter names and the element parameter name for “x” we get the crucial code fragment. I am also using the result passing symbol for readability.
(fun a lacc racc acc -> lacc (a :: (racc acc)))|> (fun x -> x))
Another substitution, using strikethroughs to show what I changed. Coincidentally the symbol “a” from the code gets replaced by the value “a”.
(fun
a lacc racc acc -> ("a" :: (racc acc)))|> (fun acc -> acc) |> (fun x -> x))
One more result passing symbol for clarity.
(fun
a lacc racc acc -> ("a" :: (acc |> racc)))|> (fun acc -> acc) |> (fun x -> x))
At the bottom of all the right trees we eventually hit an empty list, so we make one more substitution.
(fun acc -> ("a" :: (acc |> (fun acc -> acc)))) |> (fun acc -> acc) |> (fun x -> x))
(fun x -> x)
starts the whole continuation unwinding (executing). Notice the anonymous function from nodeF
only has three of its four parameters filled in. This is where the code we broke-out in inOrder
comes into play. I teased apart inOrder
to explicitly show how the CPS pattern wraps the call to foldHeap in a function taking the implicit parameter acc : 'a list
.
x []
We accomplished building a partial function with one parameter. []
provides the seed empty list passed down the nested acc
parameters.
Here is a reasonable facsimile of the generated function and execution:
And here is the finished product:
If not for adding line breaks for better readability, this is seven lines of code. Six to create a generic fold structure, and one to perform an in-order traversal. However this is a case where F#’s famous expressiveness suffers from the succinctness. If you did not know beforehand how this works, good luck figuring it out!
The complete source code for the BinomialHeap can be found here.
Pingback: F# Weekly #45, 2012 | Sergey Tihon's Blog