Joachim Breitner's Homepage

For a while I have been pondering over a problem that arises when your functionally written program has some state with cross references – for example a list of users, each of which uses a number of computers, and a list of computers, each having an owner.

Implicit referencing

For doing queries on such data, it would be convenient if every reference is just the referenced object itself. Although we would visualize this as a graph, semantically, it is more like an infinite tree. This is possible in Haskell, due to laziness, and if you create the data structure cleverly, it even uses constant memory, no matter how “deep” you enter this infinite tree (a recent post of mine talks about this). A possible data definition would be:

Explicit referencing

But such a representation is very unsuitable for updates (at least I can’t think if a nice way of updating such a graph without breaking the internal cross-references) and serialization, which is a must for a HAppS based application. So what one would probably start with is this data structure:

I think the semantics of this are clear. Note that the referencing is currently not type-safe, but this can be provided by phantom types. Maybe I’ll write more about that later.

Now imaging you want to display the information about the first computer with your web application. You extract the information with
let State _ cs = testState in head cs and pass that to your templating engine. But what if your template wants to display the name of the owner? It only has access to his userId. You would either need to know what information the template will ever need, extract that from the state beforehand and pass it along, or give the template access to the whole state. In that case, though, there has to be lookup-logic in your output code, which is also not nice.

Woudln’t it be nice if you could, in your application logic, work with the explicit references, which are easy to modify and store, but somehow turn that into the implicit referencing?

Duplicated representation

One way would be to have two unrelated sets of data structures, ExplicitState, ExplicitUser, ExplicitComputer, which use explicit identifiers to reference each other, and ImplicitState,... which are defined as the first representation of our state. It is then mostly trivial to write a function that converts ExplicitState to ImplicitState.

The big disadvantage of this is that you have to maintain these two different hierarchies. It also means that every function on the state has to be defined twice, which often almost identical code. Clearly, this is not desirable.

Annotated representation

It would be more elegant if the state is stored in one data type that, controlled by a type parameter, comes in the one or the other representation. To do that, we need two types: One that contains a value, and one that contains just a reference:

Here we introduce a type parameter “ref”, which will later be either Id or Ref. Note that now a reference also states the object it is a reference for, which greatly increases type safety. Functions on these data types that don’t work with the references will be polymorphic in the “ref” type parameter, so only need to be written once. A User Id is a complete user with all related data, while User Ref is a user with only references. And a Ref (User Ref) is reference to a user, which contains references...

Not so kind kinds

Did you notice the lack of a deriving clause? Our data structures have the relatively peculiar kind ((* -> *) -> *), which makes it hard for the compiler to derive instances for things like Show. But we already know that we will only use Id or Ref for the type variable, so we can use ghc’s StandaloneDeriving language extension and have these instances created:

Note how we “tie the knot” in the let expression. This is the trick that ensures constant memory consumption, because every reference points back to the same place in memory.

Satisfied already?

So what do we have? We have no duplication of data types, we can write general functions, and we can resolve the explicit referencing. We can also easily write functions like unrefUser :: State Ref -> User Ref -> User Id, which transform just a part of the state.

But writing unrefState is very tedious when the State becomes more complex. Each of the other unrefSomething functions are very similar, but need to be written anyways. This is unsatisfactory. What we want, is a generic function, something like

gunref :: State Ref -> a Ref -> a Id

which, given a state with explicit references, replaces all explicit references in the first argument (which could be State Ref or User Ref or anything like that) with implicit ones. This function can not exist, because we would not know anything about a and b. But maybe we can do this:

gunref :: (Data (a Id), Data (a Ref)) => State Ref -> a Ref -> a Id

Typeable and Data

Before being able to do so, we need to derive Data for our types. We can start with

but that will complain about missing Typeable instances. Unfortunately, ghc’s deriver for Typeable (even the stand-alone-one), does not handle our peculiar kind, so we need to do it by hand. With some help from quicksilver on #haskell, I got it to work:

everywhere is not enough

Turning to the documentation of Data.Generics, I notice with some disappointment that there is no function that is able to change a type – they all seem to replace a value by another value of the same type. But the functions gfoldl and gunfold sounded like they could be used for this.

Warning: What comes now is a very non-haskellish hack that subverts the type system, just to get the job done. Please read it with a healthy portion of distrust. If you know of a cleaner way of doing that, please tell me!

Wrapped Data

I want to do some heavy type hackery, so I need to disable haskell’s type system. There is Data.Dynamic, but not even that is enough, as we need to carry a type’s Data instance around as well. So let’s wrap that up:

There is also a function that converts an AData back to a normal type, if possible. If it’s not possible, then there is a bug in our code, so we give an informative error message.

AData transformers

Similar to everywhere, we want to have transformers that combinable. They need to have the change to convert an arbitrary value, but also signal that they could not convert something (and this something has to be recursed into). I came up with this:

I hope this code does not inflict too much pain on any Haskell-loving reader. I know I violated the language, but I didn’t know how to do it better (at least not without using Template Haskell). I also know that this is not very good performance wide: Every single value in the input will be deconstructed, type-compared several times and re-assembled. If that is an issue, then this function should only be used for prototyping.

Almost done

To apply this to our state, we only need to glue it to our lookup functions from above:

Now we have a generic unreferencer. I set the type a bit more specific than necessary, to make it safe to use (under the assumption that the list of lookup functions is complete and will not leave any Ref in the output).

Comments

i just hope folks like you who have the ability to think and work through such details will be brave enough to continue to do so, and that eventually it will be understood enough to be boiled down to something the rest of us can understand and use :-)

Well, that’s a price to pay for referential transparency (which is probably worth a high price). Note that you can have other ways of representing references (e.g. IORefs) that are closer to what a imperative programmer is used to – I was just exploring some other, more elegant, techniques. This is by no means meant to be the best way.

Object orientation does not cost you referential transparency. Nor does OO programming imply imperative. Even in haskell does instancing a type class associate methods with a data type which is at heart OO. The complexities are due to Haskell's limitations but its pros out weigh this con.