Context Navigation

Abstract

This addition to Django's ORM adds simple drop-in caching, compatible with
nearly all existing QuerySet methods. It emphasizes
performance and compatibility, and providing configuration options with sane
defaults. All that is required for basic functionality is a suitable
CACHE_BACKEND setting and the addition of .cache() to the appropriate
QuerySet chains. It also speeds up the lookup of related objects, and even
that of ​generic relations.

Proposed Design

The QuerySet class grows new methods to add object caching:

.cache()

cache(timeout=None, prefix='qscache:', smart=False)

This method causes models instances found in the returned
QuerySet to be cached individually; the cache key is
calculated using the contrib.contenttypes model id and the
instance's pk value. (This is all done lazily and the position
of cache() does not matter, to be consistent with other methods.)

timeout defaults to the amount specified in CACHE_BACKEND.
prefix is in addition to CACHE_MIDDLEWARE_KEY_PREFIX.

Internally, QuerySet grows some new attributes that affect how SQL is
generated. Use of cache() causes the query to retrieve only primary
keys of selected objects. in_bulk() uses the cache directly, although
cache misses will still require database hits, as usual. Methods such as
delete() and count() are largely unaffected by cache(), but
methods such as distinct() are a more difficult case and will require
some design decisions. Using extra(select=...) is also a possibly
unsolvable case.

If values() has been used in the query, cache() takes precedence
and creates the values dictionary from cache. If a list of fields is
specified in values(), cache() will still perform the equivalent of a
SELECT *.

select_related() is supported by the caching mechanism. The appropriate
joins are still performed by the database; if joins were calculated with
cached foreign key values, cache misses could become very costly.

.cache_related()

cache_related(fields, timeout=None, prefix='qscache:', smart=False)

fields is a name or list of foreign keys, many-to-many/one-to-one fields,
reverse-lookup fields, or generic foreign keys on the model. Model instances
pointed to by the given relation will be cached similarly to cache().

I'm not sold on the signature of this method... *args would be nice
but then the other defaulted arguments would be replaced by kwargs.

Also, the special string '*' could be accepted to cache all relations.
Either that, or it is implied by the lack of a fields argument?

Aside

Without database-specific trickery it is non-trivial to perform SQL JOINs
with generic relations. Currently, a database query is required for each
generic foreign key relationship. The cache framework, while unable to
reduce the initial number of database hits, greatly alleviates load when
lists of generic objects are required. Using this method still loads
generic foreign keys lazily, but more quickly, and also uses objects cached
with cache().

.cache_set()

cache_set(cache_key, timeout=None, smart=False, depth=1)

Similar to taking the resulting QuerySet and storing it directly in the
cache. Overrides cache(), but does not cache relations.

If select_related() is used in the same QuerySet, cache_set() will
also cache relations as far as the select_related()'s joins reach.

If cache_related() is used in the same QuerySet, it overrides use of
select_related().

Background logic

The implementation class contains a registry of models that have been requested
to cache (directly or via a relation).

To achieve as much transparency as possible, the QuerySet methods quietly
establish post_save and post_delete signal listeners the first time a
model is cached. Object deletion is handled trivially. On object creation or
modification, the preferred behavior is to create or update the cached key
rather than simply deleting the key and letting the cache regenerate it;
the rationale is that the object is most likely to be viewed immediately after
and caching it at post_save is cheap. However, this may not be desirable in
certain cases.

To reduce the number of cache misses, additional "smart" logic can be added.
For example, the first time a model is registered to the cache signal listener,
its model instances are expected to be uncached. In this case, rather than
fetching only primary keys, the objects are retrieved as normal (and cached).

By storing the expiration time, this can also take effect whenever the
cached objects have likely timed out. All "smart" functionality is enabled
using the smart keyword argument.

Notes

Code layout

All caching code lives in a separate app at first. A custom QuerySet class
derives from the official class, overriding where appropriate. A Manager
class with an overriden get_query_set() is used for testing, and
additional middleware, etc. are located in the same folder. Perhaps
eventually, the new code can be merged to trunk as Django proper. Hopefully
the code will not be too invasive, but quite a few QuerySet methods will
have to be hijacked. QuerySet refactoring would be an ideal merge time.

If the transaction middleware is enabled, it is desirable to have the cache
only update when the transaction succeeds. This is simple in implementation
but will couple the transaction middleware to the cache if not designed
properly. An additional middleware class can be created to handle this
case; however, it will have to stipulate placement immediately after the
TransactionMiddleware in settings.py, and might be confused with the
existing CacheMiddleware.

I've been thinking quite a lot about the multitude of combinations of
methods I've got here... I'm going to implement the simplest things I
had in the original proposal first and branch out from there. I'll
likely post some sort of map of the combinations later once I get it
down on paper.

Interface changes

I'm considering just making "smart" behaviour standard, or at least default.

Perhaps the default cache key prefix should be specifiable in settings?

Should cache_related() lose the depth argument and merely steal it
from select_related() instead, if given?

When cache() is used with values(), perhaps another option could be
added to allow retrieval of only the specified fields--however, this would
break any regular cached lookup for that object.