Saturday, September 19, 2009

Twisted Web in 60 seconds: static URL dispatch

Welcome to the third installment of "Twisted Web in 60 seconds". The goal of this installment is to show you how to serve different content at different URLs using APIs from Twisted Web (the first and second installments covered ways in which you might want to generate this content).

Key to understanding how different URLs are handled with the resource APIs in Twisted Web is understanding that any URL can be used to address a node in a tree. Resources in Twisted Web exist in such a tree, and a request for a URL will be responded to by the resource which that URL addresses. The addressing scheme considers only the path segments of the URL. Starting with the root resource (the one used to construct the Site) and the first path segment, a child resource is looked up. As long as there are more path segments, this process is repeated using the result of the previous lookup and the next path segment. For example, to handle a request for /foo/bar, first the root's "foo" child is retrieved, then that resource's "bar" child is retrieved, then that resource is used to create the response.

With that out of the way, let's consider an example that can serve a few different resources at a few different URLs.

First things first: we need to import Site, the factory for HTTP servers, Resource, a convenient base class for custom pages, and reactor, the object which implements the Twisted main loop. We'll also import File to use as the resource at one of the example URLs.

from twisted.web.server import Site from twisted.web.resource import Resource from twisted.internet import reactor from twisted.web.static import File

Now we create a resource which will correspond to the root of the URL hierarchy: all URLs are children of this resource.

root = Resource()

Here comes the interesting part of this example. I'm now going to create three more resources and attach them to the three URLs /foo, /bar, and /baz:

Last, all that's required is to create a Site with the root resource, associate it with a listening server port, and start the reactor:

factory = Site(root) reactor.listenTCP(8880, factory) reactor.run()

With this server running, http://localhost:8880/foo will serve a listing of files from /tmp, http://localhost:8880/bar will serve a listing of files from /lost+found, and http://localhost:8880/baz will serve a listing of files from /opt.

I'm looking at the Twisted source. It looks like file operations are blocking... it's just using the standard Python "open" call and isn't setting the os.O_NONBLOCK flag.

If file IO is blocking, and everything runs in a single OS thread under Twisted, how performant is this? Should we still delegate serving of static content to a separate web server (whether it's nginx, lightty, etc)?

Yes, twisted.web.static.File is implemented in terms of the normal, blocking file I/O calls. If reading data from the filesystem is slow, then this can be a problem. Generally, "slow" could reasonably mean that the filesystem is actually mounted from the network (eg NFS). Since this typically isn't the case, it's not very common to need to worry about it. For a normal, local filesystem, file I/O only blocks for a very short period. Overall, it's something to worry about later, not now.

There are a variety of options for dealing with the issue when you do need to consider it. Serving static content with another web server is certainly one. At some point, you don't even really care what the web server is, you just want to dump your static data onto a CDN and move on. :) However, it's also possible to implement something like File that makes use of some platform's asynchronous I/O APIs. O_NONBLOCK doesn't really help here - POSIX more or less lets systems pretend that file-based I/O is always non-blocking. O_NONBLOCK is around mostly for FIFOs and for its interaction with fcntl(2). However, Windows does have real asynchronous file I/O (via IOCP) and it's possible that someday Linux and other POSIX platforms will too. Any of these could be used to create a more event-driven-friendly version of File.

I have a server with an ad hoc, informally-specified, bug-ridden, slow implementation of half of Twisted, which I want to supplement with an internal webserver. (Preferably on the same port as its other protocol, evil as that may sound...) After looking at Twisted's documentation, I gave up and started looking at other asynchronous frameworks; you've given me hope that I can use it after all.

Isn't there AIO for such purposes? Though I don't know if it's stable enough.

Another solution would be a separate worker thread which would take requests out of a queue, serve them and report the results in a loop; the Twisted reactor would then fill the queue and poll for interesting events from that worker.

Don't know if it smells too much like duct tape solution, but at least we would have more or less complete asynchronous support for all I/O bound operations.

The AIO APIs available for POSIX and on Linux are among the APIs which may someday be good enough to use to solve this problem, yes. :) There seems to be little interest from Linux kernel developers to actually solve this problem, though, so it may take a while.

You're absolutely right that a userspace threadpool could also be used to do this, though. Great point.

About Me

I'm a software professional with over 15 years industry experience ranging from startups (with as few as four people) to multinational banks. I've built network software, database software, user-facing software, backends, distributed systems, games, business engines, application servers, and more. I've lead teams and followed leaders (great and otherwise).

I'm also deeply interested and involved in environmental protection, clean food, and how agricultural systems impact human health. I live and work on a small farm with my family building first-hand experience with as many of the related systems as I can. You can read more about that side of my life on my other blog.

Supporting Open Source

I'm a prolific contributor to free and open source software projects, both on a volunteer and paid basis. I greatly appreciate donations to support the volunteer efforts. Feel free to let me know which software you're interested: it's always great to hear from users and a downside of a lot of this work is not getting to hear from the people who use the result very much.