Records in Haskell

The narrow issue: namespacing for record field names. Currently in Haskell two records in the same module can't share a field name. This is sometimes extremely painful. This page is about the narrow issue.

The broad issue: first class record types. In Haskell there is no "record type" per se. Rather, you can simply give names to the fields of a constructor. Records are not extensible and there is no polymorphism on records.

This page focuses exclusively on the first, narrow issue of disambiguating record field names. We have a separate Wiki page, ExtensibleRecords, on the broad issue of first class record types.

On this page I'd like to summarise the problem, and specify alternative designs. So far it is mostly a skeleton: please fill it out. The idea is to hold a discussion by email (ghc-users?) but to collect results (alternative designs, trade-offs, pros and cons) here, because email threads quickly get lost. Simon PJ.

In the Persistent data store library, Yesod works around the issue by having the standard of prefixing every record field with the record name (recordA and recordClashA). But besides being extremely verbose, it also limits us from experimenting with more advanced features like a partial record projection or an unsaved and saved record type.

The verbose name-spacing required is an in-your-face, glaring weakness telling you there is something wrong with Haskell. This issue has been solved in almost every modern programming languages, and there are plenty of possible solutions available to Haskell.

Never mind experimental/advanced features, it gets in the way of doing utterly dull things like:

look up an entity by name (such as a customer or dictionary entry)

get its identifier xxx_id

link that same field name xxx_id in other record types to create/read/update/delete,

without having to copy the darn value to xyz_id, pqr_id, ...

And inhibits doing relatively low-level generic/polymorphic stuff like standard print-formatting for any records with lastName and firstName fields.
-- added by AntC 21-Feb-2012

Solutions

So we have decided to avoid the extensible record debate, but how can we have multiple record field selectors in scope and correctly resolve the type of the record? There are two main mechanisms on offer:

Plan A: Name spacing. This uses qualified names to disambiguate record field names.

Comparisons

The benefit of abstracting over field names in Overloading is being able to write code that works against any Record with a given field. So I can have a function:

getA = r.a

and that can work for both Record and RecordClash if they are defined in the same module because they both have a field a.
With other approaches (including TDNR) this will fail to type check unless the compiler can determine the type of r is either Record or RecordClash. Note that we already can accomplish this on an opt-in basis with Type Classes: making this automatic is not required and could give the unwary user weakly-typed code.

The advantage of Namespacing is that the implementation is clear, straightforward, and has already been done in Agda and Frege. We can either stop with name-spacing (Agda) or continue on with automatically resolving the field when the dot operator is used. Overloading has seen downsides in practice. In the words of the Frege author, who abandoned Overloading with abstraction over field names:

only very inefficient code could be generated, if you have to access or update a field of some unknown record. In the end, every record type was basically a map.

it turned out that errors stemming from mistyping a field name often could not be diagnosed at the point where they were committed, but led to inferred types with crazy signatures and an incomprehensible type error at the use side of the function that contained the error.

the extra constraints complicated the type checker and did not play well with higher kinded type variables (at least in the code I had then, I do not claim that this is nessecarily so).

Overloading without abstraction over fields may be able to avoid some of these potential downsides, and a judicious (no virtual fields, etc) implementation of either could look very similar to the programmer.

Type directed name resolution

One particular way of integrating this idea into Haskell is called (TDNR). All of the name-space mechanisms require some level of user-supplied disambiguation: if there are two fields a in scope, you must use a qualified name to disambiguate them. What is tantalising about this is that the type of the argument immediately specifies which one you mean. There is really no ambiguity at all, so it is frustrating to have to type qualified names to redundantly specify that information. Object-oriented languages take for granted this form of type-directed disambiguation. Proposed a couple of years ago, the Haskell community didn't like it much. (But I still do; SLPJ.)

The discussion has many similarities with the original Type directed name resolution proposal: the question seems to be largely about nailing down a concrete implementation. The original TDNR proposal had internal Overloading in mind, but Namespacing ends up having similarities.

Haskell already has a (tried and tested) mechanism to disambiguate where "the type of the argument immediately specifies which one you mean" -- namely class/method/instance resolution. The DORF proposal uses this mechanism (and this mechanism alone: no funny-hand-shake syntax) -- AntC 21-Feb-2012

Other (FP) languages

If you know of other relevant language implementations, please add them!

The DDC language (very similar to Haskell) puts forth a similar solution to Frege. See the
​thesis section 2.7 - 2.7.4 pages 115 - 119

The Opa language (functional, focused on web development) states that its modules are a special case of records.

Other FP languages where I looked for a record implementation but it appeared they have no solution for records with the same fields (my information could be wrong/out-dated) ocaml, oz. However, the O in OCaml is for objects, and objects have structural typing that supports abstraction over fields.

I couldn't find great specific information on record implementation ML variants. Best I can tell, SML does not allow records in the same module with the same field. Records from other modules require name-spacing or must be opened up similar to Agda. SML# supports ​abstraction over fields as per the overloaded records implementation.

​Roy, a functional language that targets only javascript, also has structural typing which prevents clashes and allows abstraction over fields.