At PyCon ’07, I gave a talk on testing tools in which I « performed » nine live demos of features from twill, scotch, pinocchio, and figleaf. These are tools for Web testing, Web recording/playback, nose unittest extensions, and code coverage.

This is the source code for that talk.

The Python code should work on any UNIX-like machine; I gave my talk with my MacBook, running Python 2.3.

Note that the only demo that didn’t work during the talk was the first one, in which I used ‘subprocess’ to start a CherryPy server. Since this is contraindicated IMO (I suggest using wsgi_intercept; see Demo 2) I was more or less happy that it failed. (It failed because I made a last minute adjustment to the command that ran app.py, and didn’t test it before the talk… ;(

This was billed as an « intermediate » talk at PyCon, and I jumped right into code rather than giving a detailed introduction. This document follows that strategy — so go look at the code and read what I have to say afterwards!

You will need twill 0.9b1 (the latest available through easy_install), nose 0.9.1 (the latest available through easy_install), scotch (latest), figleaf (latest), and pinocchio (latest). You will also need to have CherryPy 3.x and Django 0.95 installed.

You may need to adjust your PYTHONPATH to get everything working. Check out env.sh to see what I put in my path before running everything.

The twill test script for this app is in cherrypy/simple-test.twill. All it does is go to the main page, confirm that it loaded successfully and contains the words « Type something », fills in the form with the string « python is great », and submits it. The final command verifies that the output is as expected.

If you wanted to run all of this stuff manually, you would type the following (in UNIX):

python app.py &twill-sh -u http://localhost:8080/ simple-test.twill

So, how do you do it with twill and nose?

Take a look at the unit test source code in cherrypy/demo1/tests.py. This is a nose test that you can run by typing

tests.py is discovered and imported by nose (because it has the magic ‘test’ in its name); then setup(), test(), and teardown() are run (in that order) because they are names understood by nose.

setup executes the application app.py, capturing its stdout and stderr into a file-like object (which is accessible as pipe.stdout). setup has to wait a second for app.py to bind the port, and then sets the URL of the Web server appropriately.

test then runs the twill script via twill.execute_file, passing it the initial URL to go to.

teardown calls a special URL, exit, on the Web app; this causes the app to shut down (by raising SystemExit). It then waits for the app to exit.

A few notes:

setup and teardown are each run once, before and after any test functions. If you added in another test function — e.g. test2 — it would have access to url and pipe and an established Web server.

Note that url is not a hardcoded string in the test; it’s available as a global variable. This lets any function in this module (and any module that can importtests) adjust to a new URL easily.

Also note that url is not hardcoded into the twill script, for the same reason. In fact, because this twill script doesn’t alter anything on the server (mainly because the server is incredibly dumb ;) you could imagine using this twill script as a lifebeat detection for the site, too, i.e. to check if the site is minimally alive and processing Web stuff properly.

What if the Web server is already running, or something else is running on the port?

More generally, what happens when the Popen call goes awry? How do you debug it?

(Answer: you’ve got to figure out how to get ahold of the stdout/stderr and print it out to the environment, which can be a bit ugly.)

What happens if /exit doesn’t work, in teardown?

(Answer: the unit tests hang.)

Notes 4-6 are the reasons why you should think about using the wsgi_intercept module (discussed in Demo 2) to test your Web apps.

The use of subprocess in Demo 1 was a big ugly problem: once you shell out to a command, doing good error handling is difficult, and you’re at the mercy of the environment. But you needed to do this to run the Web server, right?

Well, yes and no. If your goal was to test the entire Web stack — from your OS socket recv, through the CherryPy Web server, down to your application — then you really need to do things this way.

But that’s silly. In general, your unit and functional tests should be testing your code, not CherryPy and your OS; the time for testing that everything works together is later, during your staging and end-to-end testing phase(s). Generally speaking, though, your OS and Web server are not going to be simple things to test and you’re better off worrying about them separately from your code. So let’s focus on your code.

Back to the basic question: how do you test your app? Well, there’s a nifty new Python standard for Web app/server interaction called WSGI. WSGI lets you establish a nicely wrapped application object that you can serve in a bunch of ways. Conveniently, twill understands how to talk directly to WSGI apps. This is easier to show than it is to explain: take a look at cherrypy/demo2/tests.py. The two critical lines are in setup(),

The first line asks CherryPy to convert your application into a WSGI application object, wsgi_app. The second line tells twill to talk directly to wsgi_app whenever a twill function asks for localhost:80.

Note that the test itself is the same, so you can actually use the test script simple-test.twill to do tests however you want — you just need to change the test fixtures (the setup and teardown code).

Note also that it’s quite a bit faster than demo1, because it doesn’t need to wait for the server to start up.

And, finally, it’s much less error prone. There’s really no way for any other process to interfere with the one running the test, and no network port is bound; wsgi_intercept completely shunts the networking code through to the WSGI app.

(For those of you who unwisely use your own Web testing frameworks, wsgi_intercept is a generic library that acts at the level of httplib, and it can work with every Python Web testing library known to mankind, or at least to me. See the wsgi_intercept page for more information.)

The basic idea behind code coverage analysis is to figure out what lines of code are (and more importantly aren’t) being executed under test. This can help you figure out what portions of your code need to be tested (because they’re not being tested at all).

figleaf does this by hooking into the CPython interpreter and recording which lines of code are executed. Then you can use figleaf’s utilities to do things like output an HTML page showing which lines were and weren’t executed.

Again, it’s easier to show than it is to explain, so read on!

First, start the app with figleaf coverage:

figleaf app.py

Now, run the twill script (in other window):

twill-sh -u http://localhost:8080/ simple-test.twill

Then CTRL-C out of the app.py Web server, and run

figleaf2html

This will create a directory html/; open html/app.py.html in a Web browser. You should see a bunch of green lines (indicating that these lines of code were executed) and two red lines (the code for page2 and exit). There’s your basic coverage analysis!

Note that class and function definitions are executed on import, which is why defpage2(self): is green; it’s just the contents of the functions themselves that aren’t executed.

If you open html/index.html you’ll see a general summary of code files executed by the Python command you ran.

Purpose: demonstrate the figleafsections plugin that’s part of pinocchio.

The figleafsections plugin to pinocchio lets you do a slightly more sophisticated kind of code analysis. Suppose you want to know which of your tests runs what lines of code? (This could be of interest for several reasons, but for now let’s just say « it’s neat », OK?)

For this demo, I’ve constructed a new pair of unit tests: take a look at cherrypy/demo3/tests.py. The first test function (test()) is identical to Demo 2, but now there’s a new test function — test2(). All that this function does is exercise the page2 code in the CherryPy app.

This runs the tests with a nose plugin that keeps track of which tests are executing what sections of app.py, and then annotates app.py with the results. The annotated file is app.py.sections; take a look at it!

What this output shows is that tests.test executed the index() and form() functions, while tests.test2 executed the page2() function only — just as you know from having read cherrypy/demo3/tests.py. Neat, eh?

Since twill is written in Python, it’s very easy to extend with Python. All you need to do is write a Python module containing the function(s) that you want to use within twill, and then call extend_with<module>. From that point on, those functions will be accessible from within twill. (Note that extension functions need to take string arguments, because the twill mini-language only operates on strings.)

For example, take a look at cherrypy/demo4/randomform.py. This is a simple extension module that lets you fill in form text fields with random values; the function fuzzfill takes a form name, a min/max length for the values, and an optional alphabet from which to build the values. You can call it like this:

extend_with randomformfuzzfill <form> 5 15 [ <alphabet> ]

If you look at the randomform.py script, the only real trickiness in the script is where it uses the twill browser API to retrieve the form fields and fill them with text. Conveniently, this entire API is available to twill extension modules.

Let’s try running it! The twill script cherrypy/fuzz-test.twill is a simple script that takes the CherryPy HelloWorld application and fills in the main page form field with a random alphanumeric string. As in Demo 2, we can put this all together in a simple unit test framework; see cherrypy/demo4/tests.py for the actual code.

You can run the demo code in the usual way:

nosetests -w demo4/ -v

If you run it without output capture, you’ll even see the random text we inserted:

Purpose: show how to use wsgi_intercept and twill to test a simple Django app.

OK, I’ve shown you how to write automated tests for CherryPy Web apps. Let’s try it out for Django, now!

Since I don’t actually know any Django, let’s just try the Django intro app, a simple poll site that lets users select choices in a poll. The admin interface is a bit tough to test with twill, because it uses JavaScript, but we can test the main poll site easily enough.

The first function you should look at is actually the last function in the file: TestDjangoPollSite.test. This function goes to « /polls », clicks on the « pycon » choice in the poll, submits it, and verifies that « pycon » has received 1 vote. (Unlike the CherryPy demos, here we’re using the twill Python API, rather than the scripting language.)

The TestDjangoPollSite.setup() function is run before the test() function, and it serves to reset the vote count in the database; it’s very much like a unittest fixture, in that it’s run prior to each test* function in TestDjangoPollSite. (If there were a teardown() function in the class, it would be run after each test* function.)

The tests.setup() and tests.teardown() serve the same purpose as their CherryPy analogs in Demo 2: setup() initializes Django and sets up the wsgi_intercept shunt mechanism so that twill can talk to the Django app directly through WSGI. In turn, teardown cleans up the WSGI shunt.

Demos 1/2 and Demo 6 collectively demonstrate (hah!) how easy it is to use twill to start testing your Django and CherryPy apps. Even the simple level of testing demonstrated here serves an important purpose: you can be sure that, at a minimum, your application is configured properly and handling basic HTTP traffic. (More complicated tests will depend on your application, of course.)

All right, now hit reload! If everything is working right, you should see the same « polls » page, but this time you’ll be going through the scotch proxy server. Check out the window in which you ran scotch — it should say something like

Already you can see that this is moderately useful for « watching » HTTP sessions, right? (It gets better!)

OK, now hit CTRL-C in the proxy server shell, to cancel. It should say something like « saved 5 records! » These records are saved into the file recording.pickle by default, and you can look at some of the files in the scotch distribution (especially those under the bin/ directory) for some simple ideas of what to do with them.

All right, so you’ve seen that you can record HTTP traffic. But what can you do with the recording?

Don’t be shy — save this to a file and run it with twill-sh! It should work.

So that’s pretty convenient, right? It’s not a cure-all — generating tests from recording can get pretty ugly, and with scotch I don’t aim to provide a complete solution, but I do aim to provide you with something you can extend yourself. (There are lots of site-specific issues that make it likely that you’ll need to provide custom translation scripts that understand your URL structure — these aren’t terribly hard to write, but they are site specific.)

Purpose: use scotch to play back the Web traffic directly and compare.

OK, and now for the last demo: the ultimate regression test!

Leave the Django site running (or start it up again) and, in the proxy window, type play-recorded-proxyrecording.pickle. This literally replays the recorded session directly to the Django Web app and compares the actual output with the expected output.

What’s happening is clear: because we’re not resetting the database to a clean state, the vote counts are being incremented each time we run the recording — after all, in each recording we’re pushing the « submit » button after selecting « pycon ».

Anyway, this is a kind of neat regression test: does your Web site still return the same values it should? Note that it’s very fragile, of course: if your pages have date/time stamps, or other content that changes dynamically due to external conditions, you’re going to have write custom filter routines that ignore that in the comparisons. But it’s at least a neat concept.

(Again, I should note that this is neat, but it’s not clear to me how useful it is. scotch is very much a programmer’s toolkit at the moment, and I’m still feeling my way through its uses. I do have some other ideas that I will reveal by next year’s PyCon…)