Webapps on App Engine, Part 5: Sessions

This is part of a series on writing a webapp framework for App Engine Python. For details, see the introductory post here.

Sessions are another component that's regularly required by webapps, but isn't really a core part of a framework. In this post, we'll discuss the session mechanisms available for App Engine and how they work, and settle on a recommendation for our own lightweight framework.

The basic mechanism behind a session library is straightforward: A random session ID is generated for the user, which is embedded in an HTTP cookie and sent to the user. Meanwhile, a record is created on the server with the same ID, containing any data the webapp wants to store about this user. When the user makes a subsequent request, the session library decodes the session ID from the cookie header, and loads the corresponding session record from permanent storage.

There are three major advantages of handling sessions this way, rather than naively storing session data directly in the cookie:

We can store data that the client shouldn't be able to modify, such as the user's access flags.

We can store data the client shouldn't even be able to read, or shouldn't be sent in the clear, such as their credentials.

We can store more data than can be practically carried in an HTTP cookie.

There are, of course, situations in which some of these constraints don't apply. Sometimes none of them apply, such as in the case of preference cookies; sometimes size is not an issue, but we want integrity and/or confidentiality. In these cases, many session libraries provide "cookie only" sessions, which store the data entirely in the cookie, while adding signing and/or encryption to prevent tampering or reading of the cookie data by the user.

Using cookie-only sessions has one major advantage: You remove the necessity to retrieve the user's data from storage on each request. This needs to be balanced with the limited storage on the one hand, and the need to embed a secret key in your code that can be used to sign and verify the cookies.

For non-cookie sessions, the session data needs to be stored somewhere. Many systems use the local filesystem, but this isn't practical in the case of distributed systems like App engine. On App Engine, that leaves us with two main options: Memcache, and the datastore.

Memcache initially seems like quite an attractive option, as it's substantially faster to query and update than the datastore. This comes with a major caveat, though: Since memcache makes no guarantees about how long values will persist, there's absolutely no guarantee that your session will still be around when the user comes back for it.

Losing the occasional user session doesn't seem like to much of a problem at first, but there are several factors that need consideration. If you're using sessions to implement a shopping site, losing a user's session is the absolute last thing you want to do: A user who suddenly loses the contents of their shopping cart is quite likely to simply leave, costing you one or even multiple sales. Even if sales aren't involved, if users regularly get logged out of their accounts, they may grow frustrated with your site and leave. Ad-hoc testing isn't sufficient to establish how much of an issue this is, either: Problems may not come up when you test the system with a reasonable load, but may instead occur when the system's under heavy load.

The other option, of course, is the Datastore. At the cost of a little extra latency, we gain near perfect reliability. With proper design, fetching the session should only require a single datastore get operation, too - no queries, which are much more expensive.

Hybrid approaches are of course also possible: We can store to both memcache and the datastore, and fetch from one, then the other, thus minimizing latency whenever memcache is available.

Solutions

Enough theorizing - let's take a look at the ready-made sessions libraries for App Engine. The first one is Beaker.

Beaker is a standalone library for session handling, implemented as WSGI middleware. It also includes caching middleware. Beaker supports App Engine datastore sessions, though an open bug means that the library currently needs some modification to work on App Engine (Edit: Now fixed!).

Using Beaker is a matter of downloading and unpacking it into your app's root directory, then opening beaker/cache.py and deleting or commenting out lines 29 through 51 - the section dealing with pkg_resources, which is unavailable on App Engine - as well as line 10, which imports pkg_resources. To use Beaker, we simply insert it as middleware, like this:

The session.type configuration option tells beaker what session storage to use - in this case, the Google datastore. Beaker also supports memcached - although not App Engine's implementation of it - and cookie-only sessions, which can be enabled by setting session.type to "cookie", and session.secret to a secret key for hashing and encrypting the cookie.

Another sessions implementation is provided by gaeutilities. gaeutilities supports datastore, memcache, and pure-cookie sessions. Using gaeutilities' sessions is even simpler:

As you can see, no middleware is required in this case - gaeutilities takes advantage of App Engine's use of the system environment to retrieve session data. gaeutilities also includes features such as token rotation to make session hijacking more difficult, while Beaker uses only a single session ID.

One quick caveat: Unlike beaker, gaeutilities does not encrypt or sign cookie-only sessions, so they should only be used for data where user tampering is not a concern.

Conclusion

We've looked at two good sessions libraries for App Engine. Either one would be a good choice for our framework. Given the current state, however, gaeutilities seems like the better choice: It doesn't require modifications to work, it's built specifically for App Engine, and if you need a lightweight library, you can easily take just the session handling code and not include the rest of the library.