Overloaded record fields: a design proposal

This page summarises a possible design that would allow
different records to share a single field label. Although it's a simple enough
idea there are numerous ramifications. Records are a swamp!

See also a similar ​2003 proposal by Simon PJ and Greg Morrisset. It is essentially the same as the proposal below, but (a) has less detail and (b) adds anonymous reccord types. Anonymous type could be an add-on feature to the design described here.

The base design

The base design has the folllowing distinct components:

A library class

class Has (r :: *) (f :: String) (t :: *) where
get :: r -> t

A record declaration generates an instance declaration for each field. For example

Dot-notation would typecheck (and desugar to) as a call of the get method. So r.x would typecheck just as if you had written (get @ T _ @ "x" @ _ r), where the @ means type application, and _ means "some type". (GHC doesn't support explicit type appplication yet, but you get the idea.)

Each of these components embodies choices which are described, with their variants, below.

Note that the proposal naturally supports record-polymorphic functions, such as

Design choices and variations

The String type parameter to Has

In the base design, the field is a String type parameter to the Has class. Another possiblity would be to invent a new fake class for each field, thus

class Has_x (r :: *) (t :: *) where
get_x :: r -> t

This seems clunky to me, and I don't really want to deal with this infinite family of classes (eg how are the brought into scope). I see no disadvantages to a String parameter, which will form part of GHC's new kind system.

Scope control by generalising the String type in Has

The proposal above doesn't allow label names to be scoped: if one module internally uses "field" as a label name then another module can break the abstraction by using the same string "field".

We can fix this by instead of having

class Has (r :: *) (f :: String) (t :: *) where

having something like

class Has (r :: *) (ft :: *) (f :: ft) (t :: *) where

(where ft stands for field type).

The expression

foo.field

(starting with a lowercase letter) would behave as described above, with ft ~ String.

But

foo.Field

(starting with an uppercase letter) would use the Field constructor that is in scope, e.g. if we had

data FieldT = Field

then ft ~ FieldT. Then we can choose whether or not to export FieldT(Field).

Should get have a proxy argument?

The type of get is very odd:

get :: Has r f t => r -> t

It's odd because "f" is not nentioned in the type after the "=>". That in turn means you could not write an unambiguous call of get, at least not without support for explicit type application (which is another story).

But I think it's OK not to be able to write a call to get, because that's what the dot-notation generates. The dot-notation produces a suitably-instantiated call to get.

Selectors as functions

Does that still happen under the new design? The simple answer is "yes,for backard compatibility",
but note that if there were more than one record
with the same field name, then there would be two top level functions both
called x, so you could not call either of them directly,

Representation hiding

Currently if you say

module M( T( x ) )
data T = MkT { x,y :: Int }

then the selector x is exported, but not y. This is quite important; we need
a way to hide the representation of T, include the existence or otherwise of a 'y' field.

What happens under the new system? If we generate instances for `(Has
T "x" Int) and (Has T "y" Int)`, then presumably they'd both be
exported (since we have no way to restrict instances). But that means
we cannot ever hide the existence of a field, which seems Plain Wrong.

I can think of three solutions, all unsatisfactory in one way or another:

An instance (Has T "x" ty) is exported only if the record selector for
T is exported. Again, this doesn't work for virtual record selectors;
and it would require a new implementation mechansim to record exactly
which instances were exported from a module.

An instance (Has T "x" ty) is ignored unless the record selector
x of type T is in scope. This would require an ad hoc hack to
the instance lookup mechanism; and it doesn't work at all for
virtual record selectors.

Require that every instance of Has looks like

instance blah => Has T "f" ty where
get = the_getter

where the_getter is a named function.
An instance (Has T "x" ty) is ignored unless the function
witnessing the instance (the_getter) is in scope. This has the
merit of working for virtual selectors, but it means that instances
of Has must be restricted to a special form (more to specify).
However it comes with a reasonable story: the instances are simply
a way to choose among the many in-scope "x"'s.

Solving this issue is important.

Syntax

The dot notation

It is critical to support dot-notation. There are quite a few things to think about:

Dot notation must work in cascades (left-associatively), and with an expression to the left:

r.x
r.x.y
(foo v).y

There is an overlap with the function composition operator, but that is already true with qualified names. It comes down this:

Function composition will only work when surrounded by spaces, thus (f . g).

Should it be possible to partly apply dot-notation, in a section, like this: map (.x) ts? (Note that the dual, map (r.) fs makes no sense.). I'm inclined to take the simple approach and say "no", you have to write map (\r -> r.x) ts.

Syntactic sugar for Has

The type of fullName above is rather verbose, and it might be nice to have
some syntactic sugar for Has, perhaps something like this:

We replace "Has r f t" by "r { f :: t }", and allow multiple such constaints
with the same r to be combined. Similarly, multiple constraints with the
same r and t could be combined further, just as they are in record declarations:

fullName :: r { firstName, lastName :: String }
=> r -> String

Record updates

At first it seems that record update could be handled readily, by adding a 'set' field to Has:

Now, similar to dot-notation, record update syntax (r { x = e1, y = e2 }) would typecheck and
desugar as if you had written (set @ "x" e1 (set @ "y" e2 r)). (I've left out a couple of
type parameters to save clutter.)

There are three problems:

There is no natural set for "pseudo" fields like area or fullName above. Maybe this could be solved by stratifying the classes:

will be rejected because rv is not polymorphic. Again, this is not easy to fix.

This problem seems to be a killer: if record-update syntax is interprested as a call to set, we cannot, ever, use record-update syntax to update a record with a polymorphic field. (We could use alternative notation; instead of r { rev = reverse } we could say

case r of { HR { .. } -> HR { rev = reverse, .. } }

where I'm using GHC's dot-dot notation to fill in the missing fields. But it's clumsy and would get worse if the data type had several constructors.) And even if we were willing to give up on record-update syntax for polymorphic fields, we'd need a rule to say which fields are or are not made an instance of Has and Upd.

I'm not very happy with any of this. Faced with these difficulties,
another alternative is to stick with the status quo for record
updates; that is, not to support any sort of overloading. But even
that is problemantic: what does e { x = True } mean if there are lots of "x" fields
in scope (which is precisely what we want to allow). Haskell's current record-update
syntax really relies on learning which type is involved, from the record selector; but if
there are many x's, it can't figure it out. Insisting that only one "x" selector must
be in scope puts us back to square one. I think the only decent solution is to
invent a new syntax for monomorphic (non-overloaded) record update, something like this:

HR { r | rev = reverse }

In general

<type constructor> { expression | field = expression, ... }

Of course this isn't very satisfactory either

New syntax required

Not overloaded, so you can't abstract over it.

Relationship to Type Directed Name Resolution

This proposal is quite closely related to the ​Type Directed Name Resolution idea, becuase TDNR
would internally generate Has constraints exactly as described above. The difference is
that TDNR wasn't intended to support abstraction over the constraint, and was explained
rather differently.