If the site requires authentication, things are slightly more complicated, but not much--at least if you have Python 2.4 installed. In order to manage cookie-based authentication procedures, you need to import a few utilities from urllib2:

>>> from urllib2 import build_opener, HTTPCookieProcessor, Request

Notice that HTTPCookieProcessor is new in Python 2.4: if you have an older version of Python you need a third-party library such as ClientCookie.

build_opener and HTTPCookieProcessor create an opener object that can manage the cookies sent by the web server:

>>> opener = build_opener(HTTPCookieProcessor)

The opener object has an open method that can be used to retrieve the web page corresponding to a given request. The request itself is encapsulated in a Request object, which is built from the URL address, the query string, and some HTTP header information. In order to generate the query string, it is pretty convenient to use the urlencode function defined in urllib (not in urllib2):

>>> from urllib import urlencode

urlencode generates the query string from a dictionary or a list of pairs, taking care of the quoting and escaping rules the HTTP protocol requires. For instance:

If you just need to perform a GET, simply forget about the second argument to urlopen2, or use an empty dictionary or tuple. You can even fake a browser by passing a convenient user agent string, such as Mozilla or Internet Explorer. This is pretty useful if you want to make sure that your application works with different browsers.

Using these two recipes, it is not that difficult to write your own web testing framework. Still, you may be better off by leveraging the work of somebody else.

I am a big fan of mini languages--small languages written to perform a specific task. (See, for instance, my O'Reilly article on the graph-generation language dot.) I was very happy when I discovered that there a nice little language expressly designed to test web applications. Actually there are two implementations of it: Titus Brown's twill and Cory Dodt's Python Browser Poseur (PBP).

PBP came first, but twill seems to be developing faster. At the time of this writing, twill is still pretty young (I am using version 0.7.1), but it already works pretty well in most situations. Both PBP and twill are based on tools by John J. Lee such as mechanize (inspired by Perl), ClientForm, and ClientCookie. Twill also uses Paul McGuire's pyparsing. However, you don't need to install these libraries; twill includes them as zipped libraries (leveraging on the new Python 2.3 zipimport module). As a consequence, twill installation is absolutely obvious and painless, being nothing more than the usual python setup.py install.

The simplest way to use twill is interactively from the command line. Here's a simple session example:

Twill recognizes a few intuitive commands, such as go, show, find, notfind, echo, code, back, reload, agent, follow, and a few others. The example shows how to access a particular HTML page and display its content.

The find command matches the page against a regular expression, thus:

>>> find("Example Web Page")

is a test asserting that the current page contains what you expect. Similarly, the notfind command indicates that the current page does not match the given regular expression.

The other twill commands are pretty obvious: echo <message> prints a message on standard output; code <http_error_code> checks that you are getting the right HTTP error code (200 if everything is alright); back allows you to go back to the previously visited page; reload reloads the current page; agent <user-agent> lets you change the current user agent, thus faking different browsers; follow <regex> finds the first matching link on the page and visits it.

To see a full list of the commands, type help at the prompt; EOF or Ctrl-D allows you to exit.

Once you have tested your application interactively, it is pretty easy to cut and paste your twill session and convert it to a twill script. Then you can run your twill script in a batch process:

$ twill-sh mytests.twill

As you may imagine, you can put more than one script in the command line and test many of them at the same time. Because twill is written in Python, you can control it from Python entirely and you can even extend its command set just by adding new commands in the commands.py module.

At the moment, twill is pretty young, and it lacks the capability to convert scripts in unit tests automatically so that you can easily run entire suites of regression tests. However, it is not that difficult to implement that capability yourself, and it is likely that twill will gain good integration with unittest and doctest in the future.