2010-01-28

Record Selector Punning with Type-Level Strings

Haskell users occasionally complain about the language’s lack of a powerful record system. Programmers want exstensibilty, a different approach to the scope of field names, and a better update syntax than Haskell’s algebraic data types, enhanced with field labels, provide.

There are some proposals, such as this one, for new Haskell record systems. There are alternative record system implementations, though none in Haskell’s most popular compiler. There are elaborate exploitations of Haskell’s powerful type system to create record systems (such as HList) or even object oriented libraries (such as OO Haskell). And there are hacks and kludges to work around one or more limitations of the existing record system. This post falls into this last category.

The referenced complaint points out that the scoping of Haskell record selectors can be annoying. If these were real records (based on expectations of records from other languages), something like this would work:

but it doesn’t because of scoping problems: each of the record selectors is a function, and I now have two name functions of different types. One solution (valid Haskell98) is to stick these definitions in separate modules, and import them qualified (e.g. import qualified Host.Host as H). Every reference to Host’s name would become (something like) H.name, and every reference to User’s name would become U.name. This solution is unpalatable enough to some that there exists a Haskell language proposal for type directed name resolution which would solve specifically this problem, introducing some new syntax along the way. Another (much maligned) workaround for the problem is to use Haskell type classes to achieve record selector overloading. In the example above, the simplest approach would look something like:

To remove the monomorphic-ness of the record selector result type, I could use Haskell’s bright, shiny, relatively new associated types extension, or its hoary-but-still-not-standard multi-parameter type classes with functional dependencies extension. Here’s how I might enhance the above with functional dependencies:

But still, for every record selector name, I need a type class. Further, even though I’d like to use records from different modules and packages together, seamlessly, I won’t be able to; I’ll end up with multiple versions of equivalent type classes — each author that wants to name a selector name will have to define a HasName type class, and the selector defined in that type class won’t be useful for pulling name values out of records defined by another author in another package. If this convention were followed universally, there’d be thousands of pitiful little type classes in different namespaces scattered across Haskell’s package space. Unpleasant.

So I’d like to contribute my (as far as I know) own variation on the type-class- driven approach, which, be it hacky, kludgy, or otherwise strange, at least doesn’t have the problem of creating thousands of incompatible type classes. Indeed, central to the solution is just one type class:

classSelectionabwheretypeSelectionTypeabselect::b->a->SelectionTypeab

This class definition requires two exstensions to Haskell: MultiParamTypeClasses and TypeFamilies. Also central to the solution is the creation of a type level alphabet of characters. To represent a few letters I might define:

dataCa=CadataCb=CbdataCc=Cc-- etc.

To represent all the letters of the alphabet (that can actually be used in Haskell identifiers), I can do something like this:

The above expression requires TemplateHaskell extension (supported only by GHC) and at least GHC 6.12.x. The above snippet also uses the unicode-properties package from hackage, and generates on the order of 90,000 new types. Ideally (for some values of ‘ideal’) I would generate a type for every possible character, and not filter by isIDStart and isIDContinue, but doing so generates a few million types, which seemed to make ghci unhappy; consequently, I limited my alphabet to characters relevant to mimicking haskell identifiers. (Note also, ‘a’ becomes C97 rather than Ca).

A type level string is some a product of type level characters. The type-level string for "hello" would be C150 :* C101 :* C108 :* C108 :* C111 :* (), where :* is a right-associative infix type constructor (requiring the TypeOperators extension) and () is used to mean an empty product/list.

Now I can define record selectors using type level strings and my single Selection type class. Defining one ‘by hand’ for the "name" selector would look like:

Of course, I wouldn’t want to define the above instance by hand; I’d use Template Haskell once again to create the instance. Template Haskell is capable, as of GHC 6.12.1, of generating a class instance with an associated type. If I go back to my original definition of User:

dataUser=User{name::String,host::Host}

I can write a Template Haskell function that takes the name of my type, determines the names of the desired selectors, and generates the necessary instances. I won’t present the (somewhat tedious) TH code, but a function can be written such as:

mkSelectionInstances::Name->-- the name of the record typeQ[Dec]-- the Haskell declarations

that will generate the required instances. It makes use of a selToType function which creates a selector name type given a selector name string:

I’m still saddled with a few problems, perhaps the least of which is that I still cannot define two records in the same scope with the same selector names. But, because I am discarding (or at least not planning to use) the ‘real’ selectors, I can adopt a convention for naming selectors that the mkSelectionInstances function will understand, and thus generate the selectors I really want.

For example, the convention could be "when creating type-level selectors, ignore everything in the name from the last tick (‘) mark to the end of the identifier". I would then define my records like this:

name can now be used to select the name component of a User value or a Host value, or any other value of a type t with a Selection t C110 :* C97 :* C109 :* C101 :* ()) instance, no matter where it has been defined:

Still, there is a problem: where should the the name, host, and address selectors be defined? I certainly don’t want to have to actually define them by hand, but if I generate them in the module where I generate the selector instances, I run into problems.

Assume I define name/host/address as above and export them, from module A. Another module, B defines similar selectors (name and address, say, and some other unrelated selectors). A third module C wants to use the records defined in modules A and B. It doesn’t matter whether it imports name and address from A or from B, but it must only import one definition. This restriction places an annoying burden on the author of C.

The approach is, nevertheless, one possible answer to the question of where to define identifiers representing my selectors. I’ve come up with two other answers, which I’ll explore below.

Alternative 1: ‘Import’ identifiers using a TH splice

Instead of defining the selector identifiers in the module that defines the record type(s) and the Selector instances for that type, use a TH function to ‘import’ the selectors in the module where they are used. For example, if I have a module C that imports records from modules A and B, I’d do something like:

I have one function, importSelectors, that, given a few record types, will generate the necessary selector function definitions, taking care not to generate a selector definition more than once if more than one of the specified records has a particular named selector. Even though User and Host, and perhaps Student each have a ‘name’ selector, importSelectors must generate only one name function (name = select (C110 :* C97 :* C109 :* C101 :* ())).

HasSelectors has an instance for each record type, and allows my ‘importer’ to get a list of selector names for each record. The mkSelectionInstances function would also generate the instance for this class, for a particular record. The record member of HasSelectors exists just to make the importSelectors call at the top of a module look nicer… I could just as easily write:

I’m not sure if there’s a name for this particular type class trick, enabling the definition of a function that takes an variable number of arguments, but I lifted it from the HaXR package. The type classes allow, finally, the definition of importSelectors:

importSelectors::Importera=>aimportSelectors=importGen[]

Some minor issues remain — how, for example, should you define the selector functions if you want to use them in the same module where you define your records? Because of Template Haskell stage restrictions, I wouldn’t be able to make use of the generated HasSelectors instance in another Template Haskell splice. I could have mkSelectionInstances generate them, but not export them, but I could run into a problem if I also wanted to import other records, with overlapping selector names, into a module where I’m also defining records. Nevertheless, with a few tweaks for corner cases like these, this approach has potential.

Alernative 2: Don’t create identifiers at all!

Instead, use a ‘literal’ representing a selector. The Haskell QuasiQuotes extension allows us to create custom literals, and I can create a Quasi Quoter to represent my record selectors:

Now, I realize that [$π|name|] is a bit more unwieldy than name. I have (in the case of name) four character of meaningful identifier and six characters of overhead, which is not convenient. But this overhead is purely syntactic, and suitable desugarer could (in the event that this idea becomes wildly popular) be created to make type level literals palatable. Perhaps ‘raw’ type level strings could look like «name», and my literal selectors could look like ℓname (a script ‘l’ symbol prefix indicating that some sort of label is being introduced), such that the following properties always hold (forall strings s):

ℓs==select«s»«s»==$(selToExps)::$(selToTypes)

Given this (unlikely to be implemented!) syntax sugar, my example would look like:

By defining records this way, I’ve thrown away some of the utility of Haskell record selectors. As implemented above, the selectors are only useful as record destructors, whereas normal Haskell record selectors are useful also as (part of) record construction as well as record update. That’s not a particular problem for this approach: it could easily be combined with, say, first-class labels. mkSelectionInstances would need to be redefined a bit to produce instances that provide the right SelectionType (e.g. User :-> String instead of String for the name selector, where :-> is a first-class-label type constructor).

Pushing the idea of type level strings as labels a little further, it would be relatively easy to reimplement some HList features — specifically, the labels of its extensible record system — using type level strings. HList depends on type level naturals and a namespace type to define labels; type level strings could be used as easily. Algorithms written in terms of labels as defined above could (I think) be agnostic as to whether they operate on a record whose underlying representation is simple Haskell ADT or a (suitably reimplemted) HList.

Yes, that would work — one reason I chose not to do this is that the modules defining User, Host, and Student would *have* to export the User, Host, and Student data constructors, otherwise they wouldn’t be fully reifiable in TH. You’d lose the option of hiding the data constructors and providing ‘smart’ constructors in their stead. The verbose approach, with the HasSelectors class, allows hiding the data constructors.

The other reason I chose not to do it is that I thought hiding the actual ‘raw’ record selectors, which you definitely want to be able to do, would hide them from reify… which it doesn’t, I discovered! (I wonder if it ought to, though… seems like being able to discover selector names with reify that are otherwise hidden by the module system could break some sort abstraction that the module author wants to achieve.)

[…] there, such as records, has, and the well known hlist. Wreckage’s novelty is in its use of type level strings. It is implemented with generous support from type families (to implement things like type/value […]