Users of
Presskit’n have been asking for traffic statistics on their press releases so I decided I would get them the most recent data possible. At first I was parsing the access log once a minute and when I was testing that I decided it wasn’t updating fast enough. I’ve gotten used to everything being instant on the internet and I didn’t want to wait a minute to see how many more views there were. In this post I show how I got it to update on page load using Apache, python, Django, and memcached.

Apache Access Logs

Apache is installed with rotatelogs. This program can be used to rotate the logs after they get too large. However I wanted a few more features. Cronolog will update a symlink everytime it creates a new log file so that you can always have the most recent stats.

CustomLog and ErrorLog directives in apache will let you pipe output to a command. So I put the full path to cronolog and then specified the parameters to cronolog. –symlink will point the named symlink to the most recent log created with cronolog. After the options, the path to the log location is specified and date formats can be used. I decided to break mine up by day.

Piping Apache Log info to a Python Script

Apache can have multiple log locations and log multiple times. So I wrote my own logging script in python that would insert into memcached. Here is the extra line I added to apache:

The page views are being added to memcached (with
cache.incr() which is new in django 1.1) for quick retrieval and the logs will still be created by cronolog so no data will be lost when the cache expires. Those logs are used in the next part.

Parsing the Logs

The hit counts will expire from the cache after 24 hours so I
parse the logs once a day and put that information into my database. For this I wrote a
django management command (I didn’t do a management command before because I wasn’t sure how it would handle the parent and child processes). This command is called by ./manage.py parse_log

The hits_today method requires that you define
get_absolute_url which is useful in other places as well. @property is a decorator that makes it possible to access the data with object.hits and leave off the parenthesis.

The hits method uses the content type framework again to look up the hits in the database.

Just the Basics

There is a lot more that can be done with this. This barely touches the raw data available in the logs. A few ways I’ve already started improving this is to not include known bots as hits, check the referrer to see where traffic is coming from, and save the keywords used in search engines.

Letting users put static files and php files in a public_html folder in their home directory has been a common convention for some time. I created a way for users to have a public_python folder that will allow for python projects.

In the apache configuration files I created some regular expression patterns that will look for a wsgi file based on the url requested. To serve this url: http://domain/~user/p/myproject, the server will look for this wsgi file: /home/user/public_python/myproject/deploy/myproject.wsgi

It is set up to run wsgi in daemon mode so that each user can touch their own wsgi file to restart their project instead of needing to reload the apache config and inconvenience everyone.

This is the code I added to the apache configuration (in a virtual host, other configs might be different):

Sometimes it’s good to hide things instead of deleting them. Users may accidentally delete something and this way there will be an extra backup. The way I’ve been doing this is I set a flag in the database, deleted = 1. I wrote this code to automatically hide records from django if they are flagged.

from django.dbimport models
class SoftDeleteManager(models.Manager):
''' Use this manager to get objects that have a deleted field '''def get_query_set(self):
returnsuper(SoftDeleteManager, self).get_query_set().filter(deleted=False)def all_with_deleted(self):
returnsuper(SoftDeleteManager, self).get_query_set()def deleted_set(self):
returnsuper(SoftDeleteManager, self).get_query_set().filter(deleted=True)

This is usable by many models by adding this line to the model (it needs a deleted field) objects = SoftDeleteManager()

This will hide deleted records from django completely, even the django admin and even if you specify the id directly. The only way to find it is through the database itself or an app like phpMyAdmin. This might be good for some cases, but I went a step further to make it possible to undelete things in the django admin.

The next thing to do was override the queryset method in the default ModelAdmin. I copied the code from the django source and changed it from using get_query_set to make it use all_with_deleted() which was a method added to the ModelManager. The following code was added to SoftDeleteAdmin.

def queryset(self, request):
""" Returns a QuerySet of all model instances that can be edited by the
admin site. This is used by changelist_view. """# Default: qs = self.model._default_manager.get_query_set()
qs = self.model._default_manager.all_with_deleted()# TODO: this should be handled by some parameter to the ChangeList.
ordering = self.orderingor()# otherwise we might try to *None, which is bad ;)if ordering:
qs = qs.order_by(*ordering)return qs

The list of objects in the admin will start to look like this.

A screenshot of the django admin interface

They are showing up there now, but won’t be editable yet because django is using get_query_set to find them. There are two methods I added to SoftDeleteManager so that django can find the deleted records.

With those updated methods, django will be able to find records if the primary key is specified, not only in the admin section, but everywhere in the project. Lists of objects will only return deleted records in the admin section still.

This code can be applied to a bunch of models and easily allow soft deletes of records to prevent loss of accidentally deleted objects.

I’ve been working on a project for a while and it has recently started to expand to an additional domain name. The domains will be using the same user base and I want to make it simple for users to be logged in at both applications. With a little research I dug up a few options I could go with. There is a redirect option, a javascript option, or a single sign on option.

With the redirect option I could redirect users to the main domain, check for cookies, and redirect them back so that they could get new cookies for the additional domain. The downside to this method is it will increase traffic for every pageload from a new visitor even if they will never need to log in. And since the sites this was for will have pages being viewed many more times than there will be logged in users, it wasn’t worth all of the extra traffic. It might be possible to minimize this traffic by only redirecting on login pages, but if the login form is at the top of all pages then it doesn’t help much.

Facebook uses a javascript method on all of the sites where you see facebook connect so you can use your facebook credentials to comment on blogs and other things. This method may be fine for their case, but again it will cause the extra traffic since the javascript is still connecting to the main server to get cookie info. I also don’t want to rely on javascript for my sessions.

I wanted a solution where it would only keep users logged in when they needed to be kept logged in. One way of knowing if they need to be kept logged in is: they are on one domain and click a link to go over to the other domain. Using a single-sign-on link to the other domain, the user would stay logged in at the new domain. The only use case that this doesn’t account for is someone is logged in at one domain and then types the other domain into the address bar. However that is a minimal case and I think the sso link will be the best way to keep users logged in most of the time and keep the overhead down.

I plan on open sourcing the django sso code so that other people can use it in their projects. It will allow a django site to accept single sign on requests and it will also help to create single sign on links to other sites. Both ends of the process don’t need to be a django site since it should work with other applications that use this type of process to authenticate users.

I’ll write a post on here about how to use the code once I get it set up at google code so if you are interested in that, you should probably
subscribe to the rss so you don’t miss it.

A lot of these steps will speed up any kind of application, not just django projects, but there are a few django specific things. Everything has been tested on
IvyLees which is running in a Debian/Ubuntu environment.

These three simple steps will speed up your server and allow it to handle more traffic.

Reducing the Number of HTTP Requests

Yahoo has developed a
firefox extension called
YSlow. It analyzes all of the traffic from a website and gives a score on a few categories where improvements can be made.

It recommends reducing all of your css files into one file and all of your js files into one file or as few as possible. There is a pluggable, open source django application available to help with that task. After setting up
django-compress, a website will have css and js files that are minified (excess white space and characters are removed to reduce file size). The application will also give the files version numbers so that they can be cached by the web browser and won’t need to be downloaded again until a change is made and a new version of the file is created.
How to setup the server to set a far future expiration is shown below in the lightweight server section.

Setting up Memcached

Django makes it really simple to set up caching backends and memcached is easy to install.

sudo aptitude install memcached, python-setuptools

We will need setuptools so that we can do the following command.

sudo easy_install python-memcached

Once that is done you can start the memcached server by doing the following:

sudo memcached -d -u www-data -p 11211 -m 64

-d will start it in daemon mode, -u is the user for it to run as, -p is the port, and -m is the maximum number of megabytes of memory to use.

Now open up the settings.py file for your project and add the following line:

CACHE_BACKEND = 'memcached://127.0.0.1:11211/'

Find the MIDDLEWARE_CLASSES section and add this to the beginning of the list:

'django.middleware.cache.UpdateCacheMiddleware',

and this to the end of the list:

'django.middleware.cache.FetchFromCacheMiddleware',

For more about caching with django see the
django docs on caching. You can reload the server now to try it out.

sudo /etc/init.d/apache2 reload

To make sure that memcached is set up correctly you can telnet into it and get some statistics.

telnet localhost 11211

Once you are in type stats and it will show some information (press ctrl ] and then ctrl d to exit). If there are too many zeroes, it either isn’t working or you haven’t visited your site since the caching was set up. See
the memcached site for more information.

Don’t Use Apache for Static Files

Apache has some overhead involved that makes it good for serving php, python, or ruby applications, but you do not need that for static files like your images, style sheets, and javascript. There are a few options for lightweight servers that you can put in front of apache to handle the static files.
Lighttpd (lighty) and
nginx (engine x) are two good options. Adding this layer in front of your application will act as an application firewall so there is a security bonus to the speed bonus.

Edit the config file for your site (sudo nano /etc/apache2/sites-available/default) and change the port from 80 to 8080 and change the ip address (might be *) to 127.0.0.1. The lines will look like the following

NameVirtualHost 127.0.0.1:8080
<VirtualHost 127.0.0.1:8080>

Also edit the ports.conf file (sudo nano /etc/apache2/ports.conf) so that it will listen on 8080.