12 June 2010

Pre-announce for the new FGL

This post is serving as a heads-up for those people that do not read the Haskell mailing lists, as well as an overall summary/history.

Edit: In the comments below, Edward Kmett points out that in some cases having a class where you know that the kind of the graphs is * -> * -> * may be beneficial (and demonstrated to me in #haskell that my counter-example doesn’t quite work). As such, I’ve sent a new email discussing a possible alternative class layout to keep using these kinds of graphs which may make the upgrade process a little easier.

It’s FGL, but not as you know it

Thomas and I didn’t want to merely do bug-fixes however. There were a few changes we wanted to make to the FGL API, the major points of which are:

Provide greater scope for customisation/optimisation of instances

Remove the restriction that graphs are of kind * -> * -> *

Allow instance writers to restrict the types of the node and edge labels

Allow instance writers to choose a custom (i.e. not just Int) type for the node index type

Remove the (in our opinion) over-usage of 3-tuples and 4-tuples and define explicit data types for Context, etc. (with record functions) to make it easier to deal with them without requiring explicit pattern matching, etc.

Proper Eq, Show and Read instances for graphs.

Whilst doing all this, however, we’ve also kept to the “spirit” of what makes FGL unique: the notion of inductive graphs. As much as possible, where it makes sense we’ve also kept the names and terminology of the current version to make the transition less seamless.

It’s not that simple, however

Primarily because of reasons 2 – 4 above, the new version of FGL we’re working on is not backwards compatible. For starters, if we don’t require kind * -> * -> * then any code that is currently written using the old classes that required that won’t work.

However, even if we kept that restriction and just wanted the ability to restrict the label types, then we’d have to have some way of making those label types part of the overall type class[es]. As such, we would need some extensions, and the two possible candidates are:

Now, MPTCs + fundeps have the advantage of being older, having more infrastructure built around them, etc. whereas there are still many limitations for Type Families (the inability to have superclass constraints, no automatic deriving for types or aliases using Type Families, etc.). However, we feel that code-wise, using Type Families result in much nicer and cleaner type signatures.

Let us consider a few examples: what does the instance definition of a graph like the one currently found in Data.Graph.Inductive.Tree look like, and what would the type signature be for a function that takes a graph and returns a list of all of the edge labels?

First of all if we defined with MPTCs + fundeps:

// Note the explicit double mention of the two label types here,
// this is needed because if we want to remove the kind restrictions
// on graphs, the first parameter must be the _entire_ graph type.
instance Graph (Gr a b) a b where
....
edgeLabels :: (Graph g a b) => g -> [b]
edgeLabels = ....

Now, whilst we need to explicitly double-up the two label types in the instance definition for the MPTC-based solution, this isn’t that bad since most users won’t see this. However, it isn’t immediately obvious just from the type signature of edgeLabels what it actually does.

As for the Type Family solution, it is more verbose but arguably the types are easier to read. Furthermore, why should the two label types be first-class members of the type definition? “g a b” isn’t the graph type, just “g” on its own is.

The situation becomes a bit more tiresome when we consider the fourth improvement we wanted to make: the ability to specify a custom node index type. We’ll now use some graph type that has kind * -> * -> * -> * where the first type parameter is the vertex type (i.e. some generic graph type that people can use with any index type they wish, preferably by newtype-ing it).

Notice that the type signature for edgeLabels when using Type Family solution remains unchanged, and why shouldn’t it? We feel that with Type Families we can better express what a function does with a minimum of clutter/boiler-plate.

Now, you may feel that still, MPTCs are preferably if only because they’re less experimental. Either way, we would still need extensions and hence there would be API breakage.

What does that mean for us?

At the moment, very little. We plan to release a “technology preview” release of the new version of FGL within the next few weeks (i.e. whenever we get around to cleaning our current code up and adding documentation). However, we don’t expect (and in fact highly discourage) anyone using the new version when we do so: we plan on releasing these solely to obtain comments and opinions from the community. We realise that FGL is a respected and honoured member of the Haskell library community, and don’t want to make drastic changes without obtaining the community’s input on what would be the best way to proceed.

But there is one thing that should be done by anyone who maintains Haskell software that uses FGL: Restrict the upper bound of the Haskell version used (either by explicitly using fgl == 5.4.2.2 or in case you think we’ll release a bug-fix version then either fgl == 5.4.* or fgl >= 5.4.2.2 && < 5.5 if your code uses Data.Graph.Inductive.PatriciaTree). This is something you should be doing anyway as FGL follows (or at least will be following from now on if it hasn’t been doing so fully up until now) the Package Versioning Policy and as such, you should specify the version bounds to those that you know contain the API that you need.

The naming controversy

When I stated on the Haskell mailing lists that people should ensure they have proper upper bounds on the version of FGL their packages use, some people stated their opinion that we should not in fact be changing the API of FGL and if we wish to do so then we should fork it and give it a new name.

Now, I’m not going to repeatallofmyargumentsagain (not forgetting this one) here. However, this entire issue has sparked off a discussion on when large-scale API breakage in a major library is appropriate; this resulted in this wiki-page which uses the case of FGL to try and determine guidelines on when this would be appropriate.

Now, if it’s the overall community’s opinion that it should be renamed, we of course will. However, to truly solve this problem we would also need to think of new module names as well, and I for one can’t really think of anything more appropriate than what FGL currently uses of Data.Graph.Inductive :s.

What is the bottom line?

The “tl;dr” version of this is that we’re working on a new version/successor to FGL; we plan to add lots of cool features at the expense of requiring some extensions; you should check the dependencies of any software you have that depends upon FGL now (since it’s good practice to do so anyway) and that there’s controversy over whether or not we should re-write the API and still call it FGL.

Like this:

Related

The problems that are currently arising for people who have both mtl and transformers-0.1.x installed: if you try to load a file that imports Control.Monad.Trans, ghci doesn’t know from which package it should get it from.

I’d like to add my voice to the din of those asking that if you change the API of FGL considerably in a non-backwards compatible way that you move it to another namespace and package name.

Otherwise you’ll wind up with the mtl/transformers parsec-2/parsec-3 etc probably all over again.

That and there are some programs that can be typed with the existing FGL that will not be able to be typed under the version you’ve described. (Those that use the graph in a polymorphically recursive manner relying on the ability to be parametric in the node or label type.)

Care to suggest a new module namespace? I’ve been trying ever since this issue arose and I can’t think of anything better than Data.Graph.Inductive, which is currently taken by current version of FGL.

As I’ve said, we’d be doing our best to ensure that all users of FGL will migrate to the new version to avoid the problems that plagued Parsec, QuickCheck, etc. (though I repeat that I still think this is going to be a much lesser problem than those due mainly to the fewer number of users of FGL).

Finally, I’m not sure what you mean by those programs that won’t be able to be used under the new version of FGL. Care to provide some examples?

I know that there are problems doing large-scale API changes for any library. However, my main goal here is to try and prevent stagnation in the graph-related Haskell libraries, of which FGL is a major part. As it stands, I feel that there’s not much more that can be done with the current layout of FGL that will not result in major API breakage, in which case there’s no point in having anyone replace Martin as maintainer (except for fixing dependency problems when a library such as base changes in a non-backwards compatible fashion). I also think it’s a bit of a minus for FGL if people like Cale Gibbard (who I think is the graph-theoretic guy in the Haskell community, or at least in #haskell) refuse to use it and prefer to roll their own every time they need a graph data structure.

Well, if we change the module name to that, there would be even less reason to change the package name ;-)

And in the example, you would just have this:

foo :: (InductiveGraph (g a b)) => g a b -> Iterated (g a) b

you might need some explicit type-family constraints in there as well to force/state that NodeLabel (g a b) ~ a and EdgeLabel (g a b) ~ b though. As an example of this, see the (really kludgy due to the afore-mentioned current limitations with Type Families), see the class currently called “MappableGraph” at http://code.haskell.org/FGL/fgl/Data/Graph/Inductive/Graph.hs (once code.haskell.org is back up).

[…] has become a rather hot topic: should a rewrite of FGL still be called FGL? I’ve covered this earlier, but I have now created a survey to try and find out what the community thinks it should be called […]

Welcome!

Greetings and salutations be upon thee to my little corner of what we like to charmingly call "the internet". This is where I put forth cunning arguments and repartee on such topics as Haskell, Gentoo, university... just don't ask me if I have a social life :s