Getting into blogging before it's too late

Graph labels redux and overall plan

This is a continuation of my previous post on my thoughts and plans for writing generic graph classes.

Overall Idea

The overall thing I want to do with these generic graph classes is to be able to deal with the vast majority of graph-like data structures in as common a way as possible. Note that I say graph-like: I’m distinguishing here between data structures that match the mathematic definition of a graph (that is, a collection of distinguished objects, where pairs of these objects may be connected in some fashion) from what is usually considered as a graph data structure: the difference mainly arises in that we have notions of expected operations on graph data structures that may not be applicable on our graph-like data types. These operations can either be ones that are forbidden (e.g. adding a node to a static type) or partially forbidden (e.g. adding a cycle to a tree).

As such, the classes as they currently stand are mainly informational: what can we determine from this graph-like type? Do we know specific properties about it (e.g. is it an instance of the class that specifies whether or not the graph is meant to be directed)? There will, of course, be classes for graph manipulation, but I see those as secondary components: for example, it doesn’t make sense to consider using standard graph manipulation functions to add or delete values from a PackageIndex as we can’t arbitrarily add values to it.

Such a collection of classes will by necessity be subject to compromise: it is not possible to have a fully-featured set of classes that comprehensively covers every single possible type of graph-like data structure whilst also being small and easy enough to use. After all, there’s no point in writing such classes if no-one uses them because they’re too difficult!

More on graph labels

In my previous post, I said that the best way of dealing with labels is similar to the way that FGL currently does: force all graph-like types to have both node and edge labels (but not require types to have kind * -> * -> * like FGL does). A few people objected, notably Sjoerd Visscher said that labels should be optional for both nodes and edges, and ideally be part of the overall node and edge types.

In theory, this solution is great (and I actually worked for a while trying to get something like it to work). However, as I stated in the comments, it fails one notable requirement: we now have to specialise functions on graphs to whether or not the graph has labels or not, and if so which ones. Specifically, if the behaviour of a function may change depending upon whether or not labels are present, such a solution may require four implementations:

No labels;

Node labels only;

Edge labels only;

Node and edge labels.

Probably the best example I can think of for this is from my graphviz library: consider the preview function as it is currently defined for FGL graphs:

This is a relatively simple function, that just sets some defaults for the main functions in graphviz. To change this to my proposed layout of compulsory labels mainly requires changes to the type signature (the only implementation change would be the way edges are defined). But with optional labels, then either four variants of this function will be required or else the user will have to specify how to distinguish the node/edge identifiers from the labels (if they exist); this latter solution is not satisfactory as the whole point of this function is to provide defaults to quickly visualise a graph, and as such should not take any other parameters apart from the graph itself.

If an “isInstanceOf” function was available (to determine whether or not the graph type is an instance of the appropriate label classes without needing to specify them as explicit type constraints), then this wouldn’t be a problem: implementers of functions would just need to take into account the four possible label groupings in their code. But as it stands, the implementation of having optional labels breaks the simplicity requirement that I’m taking into account when writing these classes.

Note that I would actually prefer to have distinct/abstract node and edge types that optionally contain labels: for the planar graph library that I’m working on all operations on edges are done via unique identifiers rather than a data-type isomorphic to a tuple of node identifiers (so as to avoid problems with multiple edges). However, for most graph types such explicit differentiation between edges won’t be required, and in general it will be simpler to both instantiate and use classes when a more simple edge type is used rather than requiring in effect a new data type for each graph (as required when using data families).

Naming and terminology

One thing I’m still not sure about: how shall I deal with the naming of functions when I have both labelled and unlabelled variants of them? Should I take the FGL route of prepending “lab” to them (e.g. nodes vs labNodes)? I’m not sure I like this solution, as I want to try and shift focus to making the labelled versions the defaults (or at least not as clumsy to use): does it make sense to adopt a policy of priming functions to distinguish between labelled and unlabelled (e.g. nodes vs nodes')? Or should some other naming policy be used?