Django tip: Caching and two-phased template rendering

We've launched user accounts at EveryBlock, and we faced the interesting problem of needing to cache entire pages except for the "You're logged in as [username]" bit at the top of the page. For example, the Chicago homepage takes a nontrivial amount of time to generate and doesn't change often -- which means we want to cache it -- but at the same time, we need to display the dynamic bit in the upper right:

One solution would be to pull in the username info dynamically via Ajax. This way, you could cache the entire page and rely on the client to pull in the username bits. The downsides are that it relies on JavaScript and it requires two hits to the application for each page view.

Another solution would be to use Django's low-level cache API to cache the results of the queries directly in our view function. The downsides are that it's kind of messy to manage all of that caching, plus each page view still incurs the overhead of template rendering (which isn't horrible, but it's unnecessary overhead).

The solution we ended up using is two-phased template rendering. Credit for this concept goes to my friend Honza Kral, who suggested the idea to me during PyCon earlier this year.

The way it works is to split the page rendering into two steps:

At cache reset time, render everything except the "You're logged in as" bit, which should remain unrendered Django template code. Cache the result as a Django template. (This is the clever part!)

At page view time, render that cached template by passing it the current user. This is super fast because, at this point, the template only has two or three template tags. (The rest of the page is already rendered.)

It's a clever solution because you end up defining what doesn't get cached instead of what does get cached. It's a sideways way of looking at the problem -- sort of like how Django's template inheritance system defines which parts of the page change instead of defining server-side includes of the common bits.

In order to make this work, we had to write two parts of infrastructure: a template tag and a middleware class that does the cache-checking and rendering. The template tag looks like this:

# Copyright 2009, EveryBlock
# This code is released under the GPL.
from django import template
register = template.Library()
def raw(parser, token):
# Whatever is between {% raw %} and {% endraw %} will be preserved as
# raw, unrendered template code.
text = []
parse_until = 'endraw'
tag_mapping = {
template.TOKEN_TEXT: ('', ''),
template.TOKEN_VAR: ('{{', '}}'),
template.TOKEN_BLOCK: ('{%', '%}'),
template.TOKEN_COMMENT: ('{#', '#}'),
}
# By the time this template tag is called, the template system has already
# lexed the template into tokens. Here, we loop over the tokens until
# {% endraw %} and parse them to TextNodes. We have to add the start and
# end bits (e.g. "{{" for variables) because those have already been
# stripped off in a previous part of the template-parsing process.
while parser.tokens:
token = parser.next_token()
if token.token_type == template.TOKEN_BLOCK and token.contents == parse_until:
return template.TextNode(u''.join(text))
start, end = tag_mapping[token.token_type]
text.append(u'%s%s%s' % (start, token.contents, end))
parser.unclosed_block_tag(parse_until)
raw = register.tag(raw)

One thing to note here is that there's a backdoor for an external process (say, our script that resets the cache) to retrieve the halfway-rendered template code for any page -- magicflag in the query string. (We actually use something different on EveryBlock; I've changed this example.) So that means the only thing the cache-resetting script has to do is make a request to the appropriate page, with that query string, and save the result in the cache. Pretty slick.

There's also a potential gotcha/limitation here: anything within {% raw %} and {% endraw %} will only have access to a template context with the default RequestContext stuff -- which, in our case, will be user-specific stuff.

Thanks again to Honza for telling me about this concept. It's a great idea, and it's serving us well.

Posted by Mark on May 18, 2009, at 5:27 p.m.:

Any reason for this code being released as GPL?

Posted by Patrick on May 18, 2009, at 7:55 p.m.:

I'm sure Adrian has his reasons, but if you can't use Adrian's GPLed code, you might want to take a look at the code in Honza's project (links in his comment on @4:15pm). The license for the CMS which uses it (Ella) appears to be standard 3 clause BSD.

This kind of optimisation is one of the reasons I'm so keen on signed cookies as an alternative to sessions. If all you need to customise is the "logged in as..." box on a page, having the username stored in a signed cookie means you don't have to hit the database (or an external session store) /at all/ for the duration of the request - just pull out the cached copy, check the signature on the cookie, extract the username and render it out on to the page. And since the computation is done entirely by the app server it scales horizontally.

If "hello {username}" is the only dynamic part, then you don't use user accounts in first place. Registration and login just to see my name at website is awful.

P.S.: this comment form is bad too. It thinks that anything inside angle brackets is HTML. BUT, when i use proper HTML < it doesn't show angle bracket! What was on your mind - don't accept angle bracket AND escape ampersand?

Posted by Davide Della Casa on May 19, 2009, at 8:57 a.m.:

Why just not using a cookie with the username and let the browser to fetch it and render it in the page?

Mark: This is licensed as GPL because we're required to release EveryBlock's source code as GPL. The project is funded by a grant, and that's the license that we were asked to use.

Sergey: If I'm logged into Google and I view the Google homepage, I see my e-mail address at the top right, but it doesn't customize the page. I would argue that if a user is logged in, the developer has an obligation to let the user know that -- regardless of whether the particular page actually changes based on the user. (And in EveryBlock's case, *of course* we're customizing pages for users -- just not the homepage, at this time.)

Simon and Davide: With a cookie, you'd either have to parse it in JavaScript (which is non-ideal because it requires JavaScript) or do it in the application, in which case this two-phased template rendering would still help you, because you've still got to figure out a way to cache the heavy stuff and let the application do the username bit dynamically. The question of whether to store the username in a cookie vs. a session is tangential to this caching approach, isn't it?

Posted by Tom W. Most on May 19, 2009, at 12:45 p.m.:

Perhaps I'm missing something, but doesn't this leave you vulnerable to injection of Django template code? I don't see any method being used to escape the content outside of the {% raw %} tag from being interpreted as template code, or do you somehow guarantee that your data never contains "{{", "{%" or "{#"?

Adrian - yes, that's what I was getting at - storing the username in a signed cookie is a great complement to this kind of two-phased rendering as it allows you to avoid having to even hit the database or lookup their session - you pull from cache, extract the username from the cookie, render the two together and you're done.

We're planning in doing something similar using ESI (edge side includes): some smart reverse HTTP proxy support ESI which lets you replace some parts of a cached page with the response of another HTTP GET.

So, we cache the output HTML at varnish level (faster than doing it in RoR or even a Metal), get the session data via a 2nd HTTP request and calling a JS function that applies user customizations (we do a little bit more than displaying the username).

We are doing it with a JS request instead of ESI because Varnish won't gunzip the HTML returned by apache to check for the esi:include tags but it's going to be supported soon.

The only downside is that the app is not 100% usable without javascript...

Also Mnot created some Javascript functions to replace some parts of a HTML document with data from other HTTP reqs: http://www.mnot.net/javascript/hinclude/

I did use them as an inpiration for a presentation of the ORM part of Django at the French Perl Workshop in Paris this week-end.My slides (in french) are here : http://o.mengue.free.fr/blog/2006/11/...