magic-removal: Model Inheritance

This is a proposal for how subclassing should work in Django. There a lot of details to get right, so this proposal should be very specific and detailed. Most of the ideas here come from the thread linked below:

2. Modeling joins in SQL

When we want a list of ItalianRestaurants, we obviously need all the fields from myapp_restaurant and myapp_place as well. This could be accomplished by inner joins. It would look something like this:

SELECT...FROM myapp_italianrestaurant as ir
INNERJOIN myapp_restaurant as r
ON ir.restaurant_id=r.id
INNERJOIN myapp_place as p
ON r.place_id=p.id

But what if we want a list of Places, what should we do? We can either just get the places:

SELECT...FROM myapp_place

Or we can get everything with left joins (this allows the iterator to return objects of the appropriate type, rather than just a bunch of
Places):

SELECT...FROM myapp_place as p
LEFTJOIN myapp_restaurant as r
ON r.place_id=p.id
LEFTJOIN myapp_italianrestaurant as ir
ON ir.restaurant_id=r.id

Imagine we have more than one subclass of Place though. The join clause and the column list would get pretty hefty. This could obviously get unmanageable pretty quickly.

I think some dbs have a maximum number of joins (something like 16), and even within the maximum, the query optimizer will either spend a while deciding which way to best join the tables or it will give up and choose the wrong way quickly. This wording is FUD-- I'll try to find specifics. --jdunck

MySQL-4.1 and newer can handle up to 61 tables in a JOIN or VIEW (5.0 and newer). Unclear what the limit is for 4.0 and
older. -- Andy Dustman

There must be major performance problems with performing that many joins in a query. What's wrong with making the default behaviour to grab only the base fields, and documenting that? (except for the fact that subclass-specific methods might break .. hm.) --harmless

In MySQL-5.0, most of the theoretically-updatable VIEWs are updatable, but currently, "You cannot use UPDATE to update more than one underlying table of a view that is defined as a join." so unfortunately this doesn't work (yet):

Unfortunately if you try to update name and description at the same time, this fails. The VIEW scheme
would still be useful for SELECT, but I think the base tables will have to be updated directly.
At least VIEWs buy you an easier SELECT statement without JOINs. It's possible to handle some
of the VIEW updating with INSTEAD OF triggers with Oracle, IBM DB2, and MS SQL (PostgreSQL uses
CREATE RULE ... AS ON (INSERT|UPDATE) ... DO INSTEAD), but that kind is not in MySQL yet.
I'm not sure how much effort it's worth. -- Andy Dustman

Another option is to lazily load objects like Restaurant and ItalianRestaurant while we're iterating over Place.objects.all(), but that requires a lot of database queries. Either way, doing this will be expensive, and api should reflect that. You're much better off just using Places fields if you are going to iterate over Place.objects.all()

Ramblings on Magic Removal Subclassing from the Pycon Sprint

If Restaurant were to inherit from this, it would not automatically have a 'name' CharField. This is because Django uses a metaclass to modify the default class creation behavior. The ModelBase metaclass creates a new class from scratch, and then selectively pulls items from the Place class as defined above and adds them to this new class, which allows it to handle Field objects specially. For each of the class's attributes, add_to_class() is called. If add_to_class finds a 'contribute_to_class' attribute, ModelBase knows it is dealing with a Field object, and calls contribute_to_class. Otherwise, it just adds it to the new class via setattr().
Thus, by the time the Restaurant class is created, the Place class which it inherits from actually looks more like this:

Thus, there is simply no 'name' field for it to inherit. So, we need to have ModelBase walk through the parent classes and call contribute_to_class on each of the fields found in _meta.fields. As we walk the inheritance tree, we look for a '_meta' attribute to determine if our current node is a Model. Otherwise, it is either a mixin class or the Model class iteself.

We can keep track of the parent hierarchy by creating _meta.parents, and having each ancestor of Model add to it in a recursive fasion, by adding the following to ModelBase.new():

# Build complete list of parents
for base in bases:
if '_meta' in dir(base):
new_class._meta.parents.append(base)
new_class._meta.parents.extend(base._meta.parents)

We can then add all of the parent fields to the new class like so:

# Add Fields inherited from parents
for parent in new_class._meta.parents:
for field in parent._meta.fields:
field.contribute_to_class(new_class, field.name)

That should pretty much be it for the object side of things; what's left is the database side of things (ie, the hard part).

Mixins

here are two scenairos where mixins would be useful (to me).

auditing. I would like to have certain models have a created by/time and last-updated by/time on the record, and possibly a XXX_history table which shows the previous version on that entry when it gets changed.

tagging. I would like to 'mark' a model as being taggable, and let the mixin worry about the rest. This I could do now by overriding the many2many field type, but I think a mixing would be nicer

row-level-security. I would like to be able to specifify a the permissions of an item in either a tagging-like fashion or as a function of the values in that item. For instance, if an item has a field "approver", and the value is set to "John", then I want John to have read/write permissions on that item. As soon as the approver field is no longer set to "John", John would loose his privledge unless it was granted by something else in the item or some default.