Cacheback is an extensible caching library that refreshes stale cache items
asynchronously using a Celery task. The key idea being that it’s
better to serve a stale item (and populate the cache asynchronously) than block
the user in order to repopulate the cache synchronously.

Using this library, you can rework your views so that all reads are from
cache - which can be a significant performance boost.

A corollary of this technique is that cache stampedes can be easily avoided,
avoiding sudden surges of expensive reads when cached items becomes stale.

Cacheback provides a decorator for simple usage, a subclassable base
class for more fine-grained control and helper classes for working with
querysets.

Now tweets are cached for 15 minutes after they are first fetched, using the
twitter username as a key. This is obviously a performance improvement but the
shortcomings of this approach are:

For a cache miss, the tweets are fetched synchronously, blocking code execution
and leading to a slow response time.

This in turn exposes exposes the view to a ‘cache stampede‘ where
multiple expensive reads run simultaneously when the cached item expires.
Under heavy load, this can bring your site down and make you sad.

Now, consider an alternative implementation that uses a Celery task to repopulate the
cache asynchronously instead of during the request/response cycle:

Items are stored in the cache as tuples (data,expiry_timestamp) using
Memcache’s maximum expiry setting (2592000 seconds). By using this value, we
are effectively bypassing memcache’s replacement policy in favour of our own.

As the comments indicate, there are two scenarios to consider:

Cache miss. In this case, we don’t have any data (stale or otherwise) to
return. In the example above, we trigger an asynchronous refresh and
return an empty result set. In other scenarios, it may make sense to
perform a synchronous refresh.

Cache hit but with stale data. Here we return the stale data but trigger
a Celery task to refresh the cached item.

This pattern of re-populating the cache asynchronously works well. Indeed, it
is the basis for the cacheback library.

Here’s the same functionality implemented using a django-cacheback decorator:

Here the decorator simply wraps the fetch_tweets function - nothing else is
needed. Cacheback ships with a flexible Celery task that can run any function
asynchronously.

To be clear, the behaviour of this implementation is as follows:

The first request for a particular user’s tweets will be a cache miss. The
default behaviour of Cacheback is to fetch the data synchronously in this
situation, but by passing fetch_on_miss=False, we indicate that it’s ok
to return None in this situation and to trigger an asynchronous refresh.

A Celery worker will pick up the job to refresh the cache for this user’s
tweets. If will import the fetch_tweets function and execute it with the
correct username. The resulting data will be added to the cache with a
lifetime of 15 minutes.

Any requests for this user’s tweets during the period that Celery is
refreshing the cache will also return None. However Cacheback is aware of
cache stampedes and does not trigger any additional jobs for refreshing the
cached item.

Once the cached item is refreshed, any subsequent requests within the next 15
minutes will be served from cache.

The first request after 15 minutes has elapsed will serve the (now-stale)
cache result but will trigger a Celery task to fetch the user’s tweets and
repopulate the cache.

Much of this behaviour can be configured by using a subclass of
cacheback.Job. The decorator is only intended for simple use-cases. See
the Sample usage and API documentation for more information.