A Tip on Debugging F# Type Signatures

I recently had a problem with someone else’s code that raised a debugging issue not found in most languages. Somehow a complex let binding calling several other let bindings returned the wrong type signature. I was expecting a collection of generic type ‘a, but the compiler was inferring a collection of type int. It’s not unusual to have to explicitly annotate the parameter and return types in let bindings, and that was my first course of action, but the let binding still returned the wrong signature despite annotations. The compiler’s type inferencing works up and down the chain of functions, so how do you isolate a problem like this?

Here’s a technique you may find useful for problems like this. The key concept is to isolate the problem, so comment out all the code inside the let bindings in question and stub out a return of the correct type until the whole complex of functions returns the correct type. Then methodically start uncommenting code that returns a value, checking the let binding return type at each step. When the return type changes to incorrect, you have isolated the problem spot (or perhaps one of several).

In my case the author of the code in question incorrectly assigned the critical value of type ‘a from a wrong value, which happened to be of type int. His only unit test case tested type int, and no other types for the generic binding, so this problem slipped through.

Signature mis-matches can be very difficult to work through, so first try to isolate the problem by simplifying the chain of signature match-ups as much as you can.

Another lesson here is although functional programming eliminates whole classes of potential bugs, it is still possible to write incorrect logic. Unit testing only catches a problem if you wrote a (correct) unit test for that particular case.

More About Signatures

Type signatures is an interesting and important topic in its own right. Here’s some miscellaneous signature topics to brush up on.

How to read F# type signatures?

F# Generics

Automatic Generalization

F# Constraints

Equality and Comparison Constraints in F#

Flexible Types

A Stackoverflow thread on flexible types

The Git Article I Hoped not to Write

As I explained in my introductory article, Git is a core technology I am exploring. I had halfway expected, and hoped, I would have nothing to say on this topic. Of course hope is a plan doomed to failure, and in this case the go-to-hell-plan is to write an article.

Cutting to the Chase

The only resources you need for Git are

The Git Reference

Pro Git

The rest of this article is another data point in the seemingly endless Git literature all over the web. Continue at your own peril.

A little Background

Back in the dark ages of mainframe batch programming source control was via delta decks and their text file descendants. This apparently evolved into CVS, the first modern source control I was exposed to. I was PM of one quarter of a 40 member project, which in turn was part of a larger 300 person project. The 40 person team hired a contractor to set-up and administer CVS for the team (he eventually had other duties as well). In this role I never got any closer than that to CVS, since I wasn’t writing code. I wasn’t sure how to interpret having a full time person who did mostly source control. The project was doing several very inefficient things I had no control over.

SourceSafe is a ticking time-bomb if you don’t take serious back-up precautions. Nuff said. The first source control system I had the responsibility to set up from scratch was TFS. Ultimately all source control includes manual policies and procedures. The trick is understanding enough about a new source control system from reading the documentation and playing with it to develop your own complete system. (Back in the day there was less how-to literature available on this.) It’s been a long time now, and I’ve lost touch with whatever changes Microsoft has made, but we were happy with TFS. It fit in well with the team’s rapid-fire development style of constantly changing anything and everything, but corporate, in its wisdom, decreed all development teams must get on the company’s Subversion “Enterprise-class centralized version control for the masses” system. So another iteration of figuring out how this really works and how to fit it with our work style. For as long as I was with that team we were never happy with Subversion.

Back on my own again I’m once more learning a new version control system, this time with no one to bounce ideas off or ask questions, even embarrassed myself posting a stupid question yesterday (sorry Phil and Github!). Phil Haack graciously responed. I had read the Git Pro book over a year ago, but without putting Git into practice it all just leaked out of my mind.

At the critical level of functionality, Git is just another (though arguably superior) choice in managing the merging and auditing of source. It still takes gatekeeping policies and procedures to manage pull requests.

Why is Every GUI a Two-edged Sword Compared to the Command Line?

Coincident with my starting to use Git, Github came out with a Windows interface. Now this was both good and bad. The good was I didn’t even glance at the Git reference or Git Pro book to develop and publish my first open source project. The bad was I didn’t even glance at the Git reference or Git Pro book. Using the GUI worked well as long as I was developing in the master branch. My confusion started when I wanted to create a branch. Like all good GUIs, it just works. Unlike all good GUIs there is no help feature to clue you into what is going on. So having forgotten all I read about Git over a year ago, I created a branch and started looking for how to hook-up the branch to its workspace. Doh! It’s all the same workspace with Git.

…off to the command line, which involves learning what’s really going on; and set up a sandbox Git project, because I have a deep aversion to doing anything with production source code when I am not 100% sure of the consequences.

Here’s what I learned in a very short time:

The Windows interface saves some drudgery in setting up a repo. You can initialize a repo with the command line

>Git init

(which I did in my sandbox), but the Windows interface sets up helpful parameters in .gitignore and .gitattributes files specifically for Visual Studio development. I’m happy not to have to dive too deep into the ins and outs of these files just yet.

The situation with commits and branches is more nuanced. So for now I’ll switch to the command line because that’s the only way I’ll learn the full range of useful things I can do with Git. I was thinking about installing a Visual Studio Git extension, and Git Source Control Provider appears to be the only option, but I think I will put that off for now.

In the mean time,

>Git status

is my friend.

And Finally, the Punch Line

–everything-is-local

That’s the tag line on the git-scm.com site, and now I understand. You can use Git for simple source control, like any other SCM. And if I had been a little braver (or foolhardy) about using an undocumented GUI feature, I might still be there. Now, however, I see my way forward learning Git. It’s actually personal productivity software. Once you acquire a certain level of adeptness you can do things you would not have considered doing, either because it was too tedious or impossible. Things like exerting more control over the commit story your branch tells, or more frequent branching to allow experimentation that otherwise risks the integrity of another branch, or multiple versions, or things I haven’t thought of.

To update your local repository with work posted to the remote

>Git fetch origin

Benchmarking F# Data Structures — Introduction

Overview

DS_Benchmark is a new open source project for benchmarking F# data structures. I developed it to provide a platform for comparing performance across data structures, data types, common actions on structures, as well as scale-up performance. It currently supports almost all the structures in the FSharp Collections, FSharpX Data Structures, and Power Pack libraries. The project’s architecture allows incorporating additional structures and functions to be timed with minimal effort.

The choice of data structure in a functional language project profoundly influences architecture and performance. You may not have the time or inclination to read-up on purely functional data structures, and even if you are familiar with a lot of data structures and time complexity how do you choose between several choices all offering O(1) complexity in a crucial function, for instance? You should always do your own profiling, and especially profile performance critical code.

Some of the benchmarks possible with this project include:

Compare similar functions across data structures.

Investigate initializing structures from different collection types.

Look for pathologies caused by certain initial data.

At what scale does a data structure become less or more performant compared to other choices?

Methodology

I intend the project to measure what I call “most common best case performance” in an “apples to apples” manner, so it does not readily tell you how many milliseconds a process took (although you can retrieve this information by not drilling down very deep in the project). I took the precautions I could think of to best isolate the event being measured. The project forces garbage collection prior to the event, and the timed event executes in it’s own exe. (It’s still important to minimize all other activity on the machine during timings.)

I found through experiment that less than 15% of timing executions are typically outliers on the long side, so DS_Benchmark times 100 executions and uses the fastest 85 to calculate a median time (in ticks) and the deviation from the median.

The output format is designed to enable further analysis within and across data structures by computing comparisons in terms of ratios (percents) rather than simply reporting milliseconds (which only apply to the host environment). Ratio comparisons are remarkably environment agnostic, as long as all original measurements are taken in the same environment. For instance I ran analyses based on measurements on two different machines, and even though the measured timings (in ticks) were wildly different, all of the comparative ratios I have analysed so far are very similar.

Simple Measurement Set-up

The console1 program in the solution provides documented examples of the most useful cases. Timings of individual scenarios print to the console, or you can generate batches of timings with a simple template model. The batched timings output to a tab-delimited file for easy import to Excel. (My next phase for this project is to persist the timing results and automate the analysis. The idea is to compare actions and inputs within and across data structures.)

For more in depth usage and architecture information see the project readme file.

Coming up Next

In following articles I will report on some interesting comparisons of data structures using DS_Benchmark. I am devoting a permanent section of this blog to F# Data Structures. Keep an eye on this space for benchmark comparisons as well as a single point of reference for many data structures and documentation, including time complexities, that otherwise are scattered all over the internet.

If you are interested in adding structures or otherwise contributing to the project, get in touch with me.