Context Navigation

Multi-Column Primary Key support

Relational database designs use a set of columns as the primary key for a table. When this set includes more than one column, it is known as a “composite” or “compound” primary key. (For more on the terminology, here is an ​article discussing database keys).

Currently Django models only support single column primary keys, denying many designs where the natural primary key of a table is multiple columns. Django currently can't work with these schemas; they must instead introduce a redundant single-column key (a “surrogate” key), forcing applications to make arbitrary and otherwise-unnecessary choices about which key to use for the table in any given instance.

This page discusses how to have Django support these composite primary keys. There are a lot of details to get right here, but done right, it would allow for more flexibility and potential simplicity in data modeling.

Current Status

Current state is that the issue is accepted/assigned and being worked on, and there is a partial implementation at ​http://github.com/dcramer/django-compositepks. The implementation allows having composite primary keys. However, support for composite keys is missing in ForeignKey and RelatedManager. As a consequence, it isn't possible to navigate relationships from models that have a composite primary key.

Major Issues

A number of APIs use "obj._meta.pk" to access the primary key field (for example, to do "pk=whatever" lookups). A composite PK implementation would need to emulate this in some way to avoid breaking everything.

A number of things use (content_type_id, object_pk) tuples to refer to some object -- look at the comment framework, or the admin log API. Again, a composite PK system would need to somehow not break this.

Admin URLs; they're of the form "/app_label/module_name/pk/"; there would need to be a way to map URLs to objects in the absence of a primary key.

mjm: I'm not very familiar with Django's internals, but I suspect the first point's answer lies somewhere near the observation that a CK needs a compound data structure. In Python the obvious choices are a tuple (or list, probably interchangeably) or a dictionary. I've actually used sequences; the obvious drawback is that they require an agreed-upon convention as to the order of the components. If I were doing it over again I would give very serious thought to using mappings, at least for things that hold key values.

Hmmm, that handwaving seems to much of what I have to say about the second point as well. Not sure just how that can not break some code, though. In my work using sequences I chose - not without some pain - to make all keys be sequences (and all keys' names were sequences, too). This is at least consistent, and the changes were straightforward. The fact that strings are themselves sequences actually made it more difficult, since overlooked cases would (often) misbehave rather than tossing an exception immediately - a point in favor of using mappings, maybe.

As for the last item, obviously a choice has to be made, and some existing cases will change. I wouldn't call the result "broken", since only hand-generated admin URLs would have a problem (well, bookmarked ones, it occurs to me, if anyone actually does that). Is anyone really concerned about making admin URLs slightly less easily hand-generatable?

Proposed Solutions

Proposed solutions for the admin URL issue:
"/app_label/module/pk1,pk2/" -- does not support text primary keys which include a comma. URL encoding to the rescue!

"/app_label/module/pk1/pk2/" -- With support for a generic view on just "/pk1/" (like the date based generic view)

mjm: From Adrian's remark about how greedy the admin URL matching is, I take it that the entire tail is taken to be the key matter. This wouldn't change: the admin code decides how it will recognize key part boundaries, and it generates URLs that conform to that. The discussion about allowing a partial key spec to select a set (rather than a single record) seems to me a diversion at this level. It does point out a [some?] non-unique selection criteria that might be especially useful to have supported automatically. I would expect a selection based on matching one (or some) of a CK's columns to look the same as if that selection were configured for non-PK column(s), I think.

Alternative methods

notnotpeter: Currently, you can "fake" it by declaring one of the keys to be primary in Django and adding a unique constraint to the model. (needs more info...examples?)

mjm: This only works when there is a non-compound unique key to use, if I understand what's being proposed here. As such, it may be workable as a way to squeeze a design that naturally has CKs into Django, but it's of no use for working with an existing schema that has only the CK.

djansoft: It can be done using unique-together.

Tobu: You can't use just one key. A use case for illustration: a NamespacedTag model with a CK made of a namespace and a name. The primary key can't be just the namespace or just the name since that would be ambiguous. The only solution (and I dislike it since it is bad modelling and forces database accesses to be serialized on a counter) is to use an AutoField.

toszter: Call me nutty, but why not declare any number of columns as primary key in your model, which django then uses to create a "hidden" pk_composite column that's just a hash of the three pk values in the row? When you do a lookup on a pk, you assume the ENTIRE combination of values is the primary key, nothing more or less. So when you look up according to the values needed to create the full pk_composite, the hash always computes and if the hash algorithm is public, can be used off the URL, the querystring in db lookups, and just about anywhere. Seems pretty simple in my mind but then again I am not going into great detail here.