There are many features you expect from a high volume web service (like Wikidot):

low response time

works for (hundreds of) thousands clients at once

makes use of a client cache (i.e. send Not Modified header instead of the full page if page was not modified from last time)

fault tolerance (if something crashes, this does not affect the rest of system)

After talking with Michał, reading some documents about HTTP, browsing Django documentation and testing tools and programs like Lighttpd, Nginx, Varnish I thought of a architecture pattern for hosting Django applications, that can be easily adapted to other frameworks and languages (including PHP).

The architecture

I'll describe the whole architecture from the application to the user's browser.

[backend] Application served with WSGI, SGI or FastCGI

[gateway] Web server talking to WSGI, SGI or FastCGI service

[frontend] Cache proxy

The main reason for separating those things is different nature of particular levels

Application may introduce big memory overhead (if it is a PHP or Python application for example)

Separate cache server may be precisely configurable depending on Cookies and other factors

Why joining backend and gateway (for example using mod_python or mod_php) is bad? The perfect example of this, is having a small and fast script (like echo "hello";) and extremely slow and unreliable client (may be some sort of attacker as well). The script should be run in milliseconds, but sending the data to client may take a few seconds. Having backend and gateway in one process makes the high memory consumption of PHP/Python interpreter last until the client receives the last byte of data.

Keeping backend and gateway separate solves this problem. The client requests a page from gateway, the gateway uses backend to generate the content, the backend generates it in milliseconds and the gateway then frees the backend and remembers the data (using much less memory than the backend process) until it is completely sent to the client. During (possibly) slow sending the response to client some other gateway process may use the backend that was just freed.

HTTP Cache

Application should at least set ETag headers (may be computed as simple as md5 from the whole response to the client). In Django, this can be done automatically using the cache system bundled with Django. Setting of this header makes client receive only the headers which greatly limits the amount of data sent to a browser (if the site is visited more often than it is modified).

Using a separate cache system you can easily deal so-called "Digg load peak". This is a situation when a huge amount of people visits a page after it is mentioned on a popular website like Digg. You can get as much as 1000 or 2000 requests per second. If you would ask the application to generate the content for every of the request, request could simply time out.

There are two or three answers to this: a simple and a precise one at least. The simple one is adjusting the frontend to cache response to each URL for a minute or two for every user not having a Cookie (so not logged in). The downside is that for non-logged user, there can be a delay before actually having the updated page. The upside is that 90% (or so) of traffic does not hit your application (and is server from cache).

The precise solution, would be setting the application to first check the ETag of the content and returning ONLY Not Modified header without actually running ANY heavy operations. This would work well, because after the first visitor asked for the page, the cache frontend saved the ETag for the page and would attach it to the consecutive requests for the same URL (even for another user). This however can be non-trivial to implement in some particular cases. Also, this method does not eliminate the use of application level, only minimize the time spent in it. On the good side, this should be completely safe to use and help the performance even in non-peak load. This means the system is worth implementing it!

Between the precise and simple solution, there is a variation of simple one, that would require more work to be done by the caching server. It would be applying the cache-for-one-minute rule after reaching some threshold request rate (like 20 reqs/s) for single URL, or for all system. This should quite good comprise between serving the most accurate data and actually being fast. I mean, if trying to be the most accurate you can, results in high load and not serving anything, it would be better to temporarily serve less accurate data.

The actual caching rules may be a combination of the ones depending on client location, referrer, cookies settings and so.

Notes on application level

In order to have your application behave well under high load, you'll need to concern the following designs:

exporting non-database tasks (like making thumbnails of images) to other machine

exporting static files serving to other machine (you'll need a separate domain and an IP address)

exporting other expensive tasks (searching, summing big sets of data) to other machine

finally database sharding (dividing the database schema or data into a bunch of machines)

Using Memcached for caching is really a good choice, because it was designed to support many caching hosts from the start. This means, that adding more "cache power" is as easy as adding a new machine to the caching cluster.

Bottom line

The nice thing about this design is that it works like a stack, each layer adding some features to the system. This means, that you can start with the simplest mod_python and with the growth of the service add a smart cache server, then change the mod_pyothn to WSGI and Nginx for example.

Also, the design allows to run two or three levels on one computer and then moving some services out. This gives much flexibility from the start, which means you don't have to worry about problems you'll encounter some day, because now you know, that you will be able to solve them. Just not having to, you can now concentrate on real thing — the application and care about the maintenance later.

My experiment

I would like to test Nginx (gateway) and its experimental mod_wsgi module to host sample Django application (backend) in WSGI mode using Varnish (frontent) as the cache proxy. The nice thing about Nginx is that it uses really small amount of system resources to talk with each client. The WSGI application makes a "pool" of backend processes that Nginx can use. Also I would like to try caching options of Nginx instead of setting up varnishd (but I assume the varnish would be better).