Monday, February 28, 2005

Here's a good interview question for a tester: how do you define performance/load/stress testing? Many times people use these terms interchangeably, but they have in fact quite different meanings. This post is a quick review of these concepts, based on my own experience, but also using definitions from testing literature -- in particular: "Testing computer software" by Kaner et al, "Software testing techniques" by Loveland et al, and "Testing applications on the Web" by Nguyen et al.

Update July 7th, 2005

From the referrer logs I see that this post comes up fairly often in Google searches. I'm updating it with a link to a later post I wrote called 'More on performance vs. load testing'.

Performance testing

The goal of performance testing is not to find bugs, but to eliminate bottlenecks and establish a baseline for future regression testing. To conduct performance testing is to engage in a carefully controlled process of measurement and analysis. Ideally, the software under test is already stable enough so that this process can proceed smoothly.

A clearly defined set of expectations is essential for meaningful performance testing. If you don't know where you want to go in terms of the performance of the system, then it matters little which direction you take (remember Alice and the Cheshire Cat?). For example, for a Web application, you need to know at least two things:

expected load in terms of concurrent users or HTTP connections

acceptable response time

Once you know where you want to be, you can start on your way there by constantly increasing the load on the system while looking for bottlenecks. To take again the example of a Web application, these bottlenecks can exist at multiple levels, and to pinpoint them you can use a variety of tools:

at the application level, developers can use profilers to spot inefficiencies in their code (for example poor search algorithms)

at the database level, developers and DBAs can use database-specific profilers and query optimizers

at the operating system level, system engineers can use utilities such as top, vmstat, iostat (on Unix-type systems) and PerfMon (on Windows) to monitor hardware resources such as CPU, memory, swap, disk I/O; specialized kernel monitoring software can also be used

at the network level, network engineers can use packet sniffers such as tcpdump, network protocol analyzers such as ethereal, and various utilities such as netstat, MRTG, ntop, mii-tool

From a testing point of view, the activities described above all take a white-box approach, where the system is inspected and monitored "from the inside out" and from a variety of angles. Measurements are taken and analyzed, and as a result, tuning is done.

However, testers also take a black-box approach in running the load tests against the system under test. For a Web application, testers will use tools that simulate concurrent users/HTTP connections and measure response times. Some lightweight open source tools I've used in the past for this purpose are ab, siege, httperf. A more heavyweight tool I haven't used yet is OpenSTA. I also haven't used The Grinder yet, but it is high on my TODO list.

When the results of the load test indicate that performance of the system does not meet its expected goals, it is time for tuning, starting with the application and the database. You want to make sure your code runs as efficiently as possible and your database is optimized on a given OS/hardware configurations. TDD practitioners will find very useful in this context a framework such as Mike Clark's jUnitPerf, which enhances existing unit test code with load test and timed test functionality. Once a particular function or method has been profiled and tuned, developers can then wrap its unit tests in jUnitPerf and ensure that it meets performance requirements of load and timing. Mike Clark calls this "continuous performance testing". I should also mention that I've done an initial port of jUnitPerf to Python -- I called it pyUnitPerf.

If, after tuning the application and the database, the system still doesn't meet its expected goals in terms of performance, a wide array of tuning procedures is available at the all the levels discussed before. Here are some examples of things you can do to enhance the performance of a Web application outside of the application code per se:

Performance tuning can sometimes be more art than science, due to the sheer complexity of the systems involved in a modern Web application. Care must be taken to modify one variable at a time and redo the measurements, otherwise multiple changes can have subtle interactions that are hard to qualify and repeat.

In a standard test environment such as a test lab, it will not always be possible to replicate the production server configuration. In such cases, a staging environment is used which is a subset of the production environment. The expected performance of the system needs to be scaled down accordingly.

The cycle "run load test->measure performance->tune system" is repeated until the system under test achieves the expected levels of performance. At this point, testers have a baseline for how the system behaves under normal conditions. This baseline can then be used in regression tests to gauge how well a new version of the software performs.

Another common goal of performance testing is to establish benchmark numbers for the system under test. There are many industry-standard benchmarks such as the ones published by TPC, and many hardware/software vendors will fine-tune their systems in such ways as to obtain a high ranking in the TCP top-tens. It is common knowledge that one needs to be wary of any performance claims that do not include a detailed specification of all the hardware and software configurations that were used in that particular test.

Load testing

We have already seen load testing as part of the process of performance testing and tuning. In that context, it meant constantly increasing the load on the system via automated tools. For a Web application, the load is defined in terms of concurrent users or HTTP connections.

In the testing literature, the term "load testing" is usually defined as the process of exercising the system under test by feeding it the largest tasks it can operate with. Load testing is sometimes called volume testing, or longevity/endurance testing.

Examples of volume testing:

testing a word processor by editing a very large document

testing a printer by sending it a very large job

testing a mail server with thousands of users mailboxes

a specific case of volume testing is zero-volume testing, where the system is fed empty tasks

Examples of longevity/endurance testing:

testing a client-server application by running the client in a loop against the server over an extended period of time

Goals of load testing:

expose bugs that do not surface in cursory testing, such as memory management bugs, memory leaks, buffer overflows, etc.

ensure that the application meets the performance baseline established during performance testing. This is done by running regression tests against the application at a specified maximum load.

Although performance testing and load testing can seem similar, their goals are different. On one hand, performance testing uses load testing techniques and tools for measurement and benchmarking purposes and uses various load levels. On the other hand, load testing operates at a predefined load level, usually the highest load that the system can accept while still functioning properly. Note that load testing does not aim to break the system by overwhelming it, but instead tries to keep the system constantly humming like a well-oiled machine.

In the context of load testing, I want to emphasize the extreme importance of having large datasets available for testing. In my experience, many important bugs simply do not surface unless you deal with very large entities such thousands of users in repositories such as LDAP/NIS/Active Directory, thousands of mail server mailboxes, multi-gigabyte tables in databases, deep file/directory hierarchies on file systems, etc. Testers obviously need automated tools to generate these large data sets, but fortunately any good scripting language worth its salt will do the job.

Stress testing

Stress testing tries to break the system under test by overwhelming its resources or by taking resources away from it (in which case it is sometimes called negative testing). The main purpose behind this madness is to make sure that the system fails and recovers gracefully -- this quality is known as recoverability.

Where performance testing demands a controlled environment and repeatable measurements, stress testing joyfully induces chaos and unpredictability. To take again the example of a Web application, here are some ways in which stress can be applied to the system:

double the baseline number for concurrent users/HTTP connections

randomly shut down and restart ports on the network switches/routers that connect the servers (via SNMP commands for example)

take the database offline, then restart it

rebuild a RAID array while the system is running

run processes that consume resources (CPU, memory, disk, network) on the Web and database servers

I'm sure devious testers can enhance this list with their favorite ways of breaking systems. However, stress testing does not break the system purely for the pleasure of breaking it, but instead it allows testers to observe how the system reacts to failure. Does it save its state or does it crash suddenly? Does it just hang and freeze or does it fail gracefully? On restart, is it able to recover from the last good state? Does it print out meaningful error messages to the user, or does it merely display incomprehensible hex codes? Is the security of the system compromised because of unexpected failures? And the list goes on.

Conclusion

I am aware that I only scratched the surface in terms of issues, tools and techniques that deserve to be mentioned in the context of performance, load and stress testing. I personally find the topic of performance testing and tuning particularly rich and interesting, and I intend to post more articles on this subject in the future.

The main advantage that Selenium has over the other tools I mentioned is that it's cross-platform, cross-browser. The main disadvantage is that it requires server-side instrumentation. The syntax used by the Selenium JavaScript engine inside the browser is called "Selenese". In Bret Pettichord's words from his blog:

"You can also express Selenium tests in a programming language, taking advantage of language-specific drivers that communicate in Selenese to the browser bot. Java and Ruby drivers have been released, with dot-Net and Python drivers under development. These drivers allow you to write tests in Java, Ruby, Python, C#, or VB.net."

Jason Huggins, the main Selenium developer, is at the same time a Plone developer. He pointed me to the Python code already written for Selenium. Right now it's only available via subversion from svn://beaver.codehaus.org/selenium/scm/trunk. I checked it out, but I haven't had a chance to try it yet. It's high on my TODO list though, so stay tuned...

One issue that almost all browser simulator tools struggle with is dealing with JavaScript. In my experience, their HTML parsing capabilities tend to break down when faced with rich JavaScript elements. This is one reason why Wilkes Joiner, one of the creators of jWebUnit, said that jWebUnit ended up being used for simple "smoke test"-type testing that automates basic site navigation, rather than for more complicated acceptance/regression testing. No browser simulator tool I know of supports all of the JavaScript constructs yet. But if the Web application you need to test does not make heavy use of JavaScript, then these tools might prove enough for the job.

Browser driver tools such as Watir, Samie and Pamie do not have the JavaScript shortcoming, but of course they are limited to IE and Windows. This may prove too restrictive, especially in view of the recent Firefox resurgence. I haven't used the Mozilla-based JSSh tool yet.

The tool I want to talk about in this post is MaxQ. I found out about it from Titus Brown's blog. MaxQ belongs to the browser simulator category, but it is different from the other tools I mentioned in that it uses a proxy to capture HTTP requests and replies. One of its main capabilities is record/playback of scripts that are automatically written for you in Jython while you are browsing the Web site under test. The tests can then be run either using the GUI version of the tool (which also does the capture), or from the command line.

MaxQ is written in Java, but the test scripts it generates are written in Jython. This is a typical approach taken by other tools such as The Grinder and TestMaker. It combines the availability of test libraries for Java with the agility of a scripting language such as Jython. It is a trend that I see gaining more traction in the testing world as Jython breaks more into the mainstream.

MaxQ's forte is in automating HTTP requests (both GET and POST) and capturing the response codes, as well as the raw HTML output. It does not attempt to parse HTML into a DOM object, as other tools do, but it does offer the capability of verifying that a given text or URI exists in the HTTP response. There is talk on the developer's mailing list about extending MaxQ with HttpUnit, so that it can offer more finely-grained control over HTML elements such as frames and tables. MaxQ does not support HTTPS at this time.

One question you might have (I know I had it) is why should you use MaxQ when other tools offer more capabilities, at least in terms of HTML parsing. Here are some reasons:

The record/playback feature is very helpful; the fact that the tool generates Jython code makes it very easy to modify it by hand later and maintain it

MaxQ retrieves all the elements referenced on a given Web page (images, CSS), so it makes it easy to test that all links to these objects are valid

Form posting is easy to automate and verify

The fact that MaxQ does not do HTML parsing is sometimes an advantage, since HTML parsing is brittle (especially when dealing with JavaScript), and relying on HTML parsing makes your tests fragile and prone to break whenever the HTML elements are modified

In short, I would say that you should use MaxQ whenever you are more interested in testing the HTTP side of your Web application, and not so much the HTML composition of your pages.

Short MaxQ tutorial

As an example of the application under test, I will use a fresh installation of Bugzilla and I will use MaxQ to test a simple feature: running a Bugzilla query with a non-existent summary results in an empty results page.Install MaxQ

I downloaded and installed MaxQ on a Windows XP box. I already had the Java SDK installed. To run MaxQ, go to a command prompt, cd to the bin sub-directory and type maxq.bat. This will launch the proxy process, which by default listens on port 8090. It will also launch the MaxQ Java GUI tool.

In the GUI tool, go to File->New to start either a "standard" script or a "compact" script. The difference is that the standard script will include HTTP requests for all the elements referenced on every Web page you visit (such as images or CSS), whereas the compact script will only include one HTTP request per page, to the page URL. The compact script also lives up to its name by aggregating the execution of the HTTP request and the validation of the response in one line of code.

To start a recording session, go to Test->Start Recording.

Now configure your browser to use a proxy on localhost:8090.

Record the test script

For my first test, I created a new standard script and MaxQ generated this code:

Note that the test class is derived from PyHttpTestCase, a Jython class that is itself derived from a Java class: HttpTestCase. What HttpTestCase does is encapsulate the HTTP request/response functionality. Its two main methods are get() and post(), but it also offers helper methods such as responseContains(text) or responseContainsURI(uri), which verify that a given text or URI is present in the HTTP request.

I started recording, then I went to http://example.com/bugs in my browser (real URL omitted) and got to the main Bugzilla page. I then clicked on the "Query existing bug reports" link to go to the Search page. I entered "nonexistentbug!!" in the Summary field, then clicked Search. I got back a page containing the text "Zarro Boogs found."

While I was busily navigating the Bugzilla pages and posting the Search query, this is what MaxQ automatically recorded for me:

This is where the compact script form comes in handy. The equivalent compact expression is:

self.get('http://example.com/bugs', None, 301)

MaxQ shines at retrieving form fields (even hidden ones), filling them with the values given by the user and submitting the form via an HTTP POST operation. This is what the second part of the generated Jython script does.

I manually added this line before the "Insert new recordings" line:

assert self.responseContains("Zarro Boogs found")

This shows how to use the responseContains helper method from the HttpTestCase class in order to verify that the returned page contains a given page.

You can also do an ad-hoc validation on the returned HTML by using a regular expression applied to the raw HTML (which can be retrieved via the getResponse() method). So you can do something like this:

assert re.search(r'Zarro', self.getResponse())

Caveat: a simple "import re" will not work; you need to import the re module like this:

from org.python.modules import re

Run the test script

When you are done browsing the target Web site for the functionality you want to test, go to Test->Stop Recording. You will be prompted for a file name. I chose test_bugzilla_empty_search.py. At this point, you can run the Jython test script inside the MaxQ GUI by going to Test->Run. The output is something like:

I think MaxQ is a useful tool for regression-testing simple Web site navigation and form processing. Its record/playback feature is very helpful in taking away from the tediousness of manually generating test scripts (as an aside, TestMaker uses the MaxQ capture/playback engine for its own functionality.) The fact that the script language is Jython is a big plus, since testers can enhance the generated scripts with custom Python logic. The source code is clean and easy to grasp, and development is active at maxq.tigris.org.

Another nifty feature I haven't mentioned is that it is easy to add your own script generator plugins. All you need to do is write a Java class derived from AbstractCodeGenerator, put it in java/com/bitmechanic/maxq/generator under the main maxq directory, recompile maxq.jar via ant, then add the class to conf/maxq.properties in the generator.classnames section. The MaxQ GUI tool will then automatically pick up your generator at run time and offer it in the File->New menu. For an example of a custom generator, see Titus Brown's PBP script generator.

On the minus side, MaxQ is not the best tool to use if you need fine-grained control over HTML elements such as links, tables and frames. If you need this functionality, you are better off using a tool such as HttpUnit or HtmlUnit and drive it from Jython. If instead of Jython you want to use pure Python, you can use mechanize or webunit, which I'll discuss in a future post.

Wednesday, February 16, 2005

This post was inspired by an article I read in the Feb. 2005 issue of Better Software: "Double Duty" by Brian Button. The title refers to having unit tests serve the double role of testing and documentation. Brian calls this Agile Documentation. For Python developers, this is old news, since the doctest module already provides what is called "literate testing" or "executable documentation". However, Brian also introduces some concepts that I think are worth exploring: Test Lists and Tests Maps.

Test Lists

A Test List tells a story about the behavior expected from the module/class under test. It is composed of one-liners, each line describing what a specific unit test tries to achieve. For example, in the case of a Blog management application, you could have the following (incomplete) Test List:

Deleting all entries results in no entries in the blog.

Posting single entry results in single valid entry.

Deleting a single entry by index results in no entries in the blog.

Posting new entry results in valid entry and increases the number of entries by 1.

Etc.

I find it very valuable to have such a Test List for every Python module that I write, especially if the list is easy to generate from the unit tests that I write. I will show later in this post how the combination of doctest and epydoc makes it trivial to achieve this goal.

Test Maps

A Test Map is a list of unit tests associated with a specific function/method under test. It helps you see how that specific function/method is being exercised via unit tests. A Test Map could look like this:

Testmap for method delete_all_entries:

test_delete_all_entries

test_delete_single_entry

test_post_single_entry

test_post_two_entries

test_delete_first_of_two_entries

test_delete_second_of_two_entries

Generating Test Lists

As an example of a module under test, I will use the Blog management application that I discussed in several previous posts. The source code can be found here. I have a directory called blogmgmt which contains a module called blogger.py. The blogger module contains several classes, the main one being Blogger, and a top-level function called get_blog. I also created an empty __init__.py file, so that blogmgmt can be treated as a package. I wrote a series of doctest-based tests for the blogger module in a file I called testlist_blogger.py. Here is part of that file:

"""Doctest unit tests for module L{blogger}"""

def test_get_blog(): """ get_blog() mimics a singleton by always returning the same object.

Each unit test function is composed of a docstring and nothing else. The docstring starts with a one-line description of what the unit test tries to achieve. The docstring continues with a list of methods/functions tested by that unit test. Finally, the interactive shell session output is copied and pasted into the docstring so that it can be processed by doctest.

For the purpose of generating a Test List, only the first line in each docstring is important. If you simply run

epydoc -o blogmgmt testlist_blogger.py

you will get a directory called blogmgmt that contains the epydoc-generated documentation. I usually then move this directory somewhere under the DocumentRoot of one of my Apache Virtual Servers. When viewed in a browser, this is what the epydoc page for the summary of the testlist_blogger module looks like this (also available here):

This is exactly the Test List we wanted. Note that epydoc dutifully generated it for us, since in the Function Summary section it shows the name of every function it finds, plus the first line of that function's docstring. The main value of this Test List for me is that anybody can see at a glance what the methods of the Blogger class are expected to do. It's a nice summary of expected class behavior that enhances the documentation.

So all you need to do to get a nicely formatted Test List is to make sure that you have the test description as the first line of the unit test's docstring; epydoc will then do the grungy work for you.

If you click on the link with the function name on it, you will go to the Function Detail section and witness the power of doctest/epydoc. Since all the tests are copied and pasted from an interactive session and included in the docstring, epydoc will format the docstring very nicely and it will even color-code the blocks of code. Here is an example of the detail for test_delete_all_entries.

Generating Test Maps

Each docstring in the testlist_blogger module contains lines such as these:

(the L{...} notation is epydoc-specific and represents a link to another object in the epydoc-generated documentation)

The way I wrote the unit tests, each of them actually exercises several functions/methods from the blogger module. Some unit test purists might think these are not "real" unit tests, but in practice I found it is easier to work this way. For example, the get_blog function is called by each and every unit test in order to retrieve the same "blog" object. However, I am not specifically testing get_blog in every unit test, only calling it as a helper function. The way I see it, a method is tested when there is an assertion made about its behavior. All the other methods are merely called as helpers.

So whenever I write a unit test, I manually specify the list of methods/functions under test. This makes it easy to then parse the testlist file and build a mapping from each function/method under test to a list of unit tests that test it, i.e. what we called the Test Map.

For example, in the testlist_blogger module, the Blogger.delete_all_entries method is listed in the docstrings of 6 unit tests: test_delete_all_entries, test_delete_single_entry, test_post_single_entry, test_post_two_entries, test_delete_first_of_two_entries, test_delete_second_of_two_entries. These 6 unit test represent the Test Map for Blogger.delete_all_entries. It's easy to build the Test Map programatically by parsing the testlist_blogger.py file and creating a Python dictionary having the methods under tests as keys and the unit test lists corresponding to them as values.

An issue I had while putting this together was how to link a method in the Blogger class (for example Blogger.delete_all_entries) to its Test Map. One way would have been to programatically insert the Test Map into the docstring for that method. But this would mean that every time a new unit test is added that tests that method, the Test Map will change and thus the module containing the Blogger class will get changed. This is unfortunate especially when the files are under source control. I think a better solution, and the one I ended up implementing, is to have a third module called for example testmap_blogger that will be automatically generated from testlist_blogger. A method M in the Blogger class will then link to a single function in testmap_blogger. That function will contain in its docstring the Test Map for the Blogger method M.

Again, an example to make all this clearer. Here is the docstring of the Blogger.delete_all_entries method in the blogger module:

delete_all_entries(self)

I manually inserted in the docstring an epydoc link to a function called testmap_Blogger_delete_all_entries in a module called testmap_blogger. Assuming that the testmap_blogger module was already generated and epydoc-documented, clicking on the link will bring up the epydoc detail for that particular function, which contains the 6 unit tests for te delete_all_entries method:

2. We start by writing a unit test for the method C.M1 from the P module. We write the unit test by copying and pasting a Python shell session output in another Python module called testlist_P. We call the unit test function test_M1. It looks something like this:

def test_M1(): """ Short description of the behavior we're testing for M1.

Method(s) tested: - L{P.C.M1}

>>> from P import C >>> c = C() >>> rc = c.M1() >>> print rc True

"""

The testlist_P module has a "main" section of the form:

if __name__ == "__main__": import doctest doctest.testmod()

This is the typical doctest way of running unit tests. To actually execute the tests, we need to run "python testlist_P.py" at a command line (for more details on doctest, see a previous blog post).

3. At this point, we fleshed out an initial implementation for method M1 in module P. In its docstring, we add a link to the test map:

4. We programatically generate the Test Map for module P by running something like this: build_testmap.py. It will create a file called testmap_P.py with the following content:

def testmap_C_M1(): """ Testmap for L{P.C.M1}:

- L{testlist_P.test_M1}"""

5. We run epydoc:

epydoc -o P_docs P.py testlist_P.py testmap_P.py

A directory called P_docs will be generated; we can move this directory to a public area of our Web server and thus make the documentation available online. When we click on the testlist_Pmodule link, we will see the Test List for module P. It will show something like:

testmap_C_M1()

10. Repeat steps 2-5 for each unit test that you add to the testlist_P module.

Conclusion

I find the combination doctest/epydoc very powerful and easy to use in generating Agile Documentation, or "literate testing", or "executable documentation", or however you want to call it. The name is not important, but what you can achieve with it is: a way of documenting your APIs by means of unit tests that live in your code as docstrings. It doesn't get much more "agile" than this. Kudos to the doctest developers and to Edward Loper, the author of epydoc. Also, kudos to Brian Button for his insightful article which inspired my post. Brian's examples used .NET, but hopefully he'll switch to Python soon :-)

If you want to see the full documentation I generated for my blogmgmt package, you can find it here.

Thursday, February 10, 2005

Here are some ideas on why I think Python is an agile language. I use the term agile as in "agile software development practices", best exemplified by Extreme Programming. I find this definition by Ron Jeffries, from his article "What is Extreme Programming", particularly illuminating:

Extreme Programming is a discipline of software development based on values of simplicity, communication, feedback, and courage. It works by bringing the whole team together in the presence of simple practices, with enough feedback to enable the team to see where they are and to tune the practices to their unique situation.

Let's see how Python fares in light of the 4 core XP values: simplicity, communication, feedback and courage.

1. Python fosters simplicity

Clean and simple syntax reads like pseudo-code

Built-in high level data types (strings, lists, tuples, dictionaries) make it possible to pack a lot of functionality in very few lines of code, without sacrificing readability

As an exercise, try to port Java code to Jython: you will see a significant reduction in line count (as much as 40% in my experience)

This really stems from the other 3 values: if you can write code that is simple, provides quick feedback and can be easily understood by your peers, then you have the courage to "go confidently in the direction of your dreams", to quote Thoreau

Courage in the XP sense also means having the guts to throw away code that doesn't work and start afresh; since the simple act of coding in Python produces pure pleasure, it follows that throwing code away and starting to code anew will be felt not as a chore, but as a chance to improve and, why not, attain enlightenment

These are just a few ideas and I'm sure people can come up with many more. How did Python improve your life as an agile software developer? Jump in with a comment or send me email at grig at gheorghiu dot net.

Wednesday, February 09, 2005

Troy Frever from Aviarc Corporation posted a message to the fitnesse mailing list announcing that he created a Google group for topics related to both Python and Agile methodologies (Extreme Programming and others).

Wednesday, February 02, 2005

There's been a lot of talk recently about "dynamic Java", by which people generally mean driving the JVM by means of a scripting language (see Tim Bray's post and Sean McGrath's post on this topic). One of the languages leading the pack in this area is Jython (the other one is Groovy). In fact, a Java Republic poll asking "What is your scripting language for Java for 2004?" has Jython as the winner with 59% of the votes.

Jython is also steadily making inroads into the world of test frameworks. It is perhaps no coincidence that in a talk given at Stanford, Guido van Rossum lists "Testing (popular area for Jython)" on the slide that talks about Python Sample Use Areas. Because Jython combines the agility of Python with easy access to the Java libraries, it is the scripting language of choice for test tools such as The Grinder v3, TestMaker, Marathon and STAF/STAX.

I want to show here how to use Jython for interactively driving a Java test tool (HttpUnit) in order to verify the functionality of a Web application.

HttpUnit is a browser simulator written in Java by Russell Gold. It is used in the Java world for functional, black-box type testing of Web applications. Although its name contains "Unit", it is not a unit test tool, but it is often used in conjunction with the jUnit framework. The canonical way of using HttpUnit is to write jUnit tests that call various HttpUnit components in order to mimic the actions of a browser. These individual tests can then be aggregated into test suites that will be run by the jUnit framework. Building all this scaffolding takes some time, and compiling the Java code after each change adds other delays.

In what follows, I want to contrast the Java-specific HttpUnit usage with the instantaneous feedback provided by working in the Jython shell and with the near-zero overhead that comes with writing Python doctest tests. The functionality I will test is a search for Python books on amazon.com.

Step 1: Install Jython

- The machine I ran my tests on is a Linux server running Red Hat 9 which already had the Java 1.4.2_04 SDK installed in /usr/java/j2sdk1.4.2_04
- I downloaded Jython 2.1 from its download site and I put the file jython_21.class in /usr/local
- I cd-ed into /usr/local and ran the command-line installer, specifying Jython-2.1 as the target directory:

- I downloaded HttpUnit 1.6 from its download site and I unzipped the file httpunit-1.6.zip under /root
- The main HttpUnit functionality is contained in the httpunit.jar file in /root/httpunit-1.6/lib and other optional jar files are in /root/httpunit-1.6/jars, so I added all the jar files in these two directories to the CLASSPATH environment variable in .bash_profile. Here is the relevant portion from .bash_profile:

- I verified that I can import the httpunit Java package from with a Jython shell session:

>>> from com.meterware.httpunit import *
>>>

- Nothing was printed to the console, which means that the import succeeded. If CLASSPATH had not been set right and Jython had not been able to process the httpunit.jar file, I would have seen an error similar to this:

This is not a full-fledged HttpUnit tutorial. For people who want to learn more about HttpUnit, I recommend the HttpUnit cookbook and this article by Giora-Katz Lichtenstein on O'Reilly's ONjava.com site.

I will however show you some basic HttpUnit usage patterns. The first thing you do in HttpUnit is open a WebConversation, then send an HTTP request to your Web application and get back the response. Let's do this for www.amazon.com inside a Jython shell:

We're already seeing some advantages of using Jython over writing Java code: no type declarations necessary! We're also testing that we get a valid response back by expecting to see 1 when we type response != None.

If we were to print the response variable, we would see the HTTP headers:

We could also look at the raw HTML output via response.getText() (I will omit the output, since it takes a lot of space).

At this point, I want to say that testing a Web application via its GUI is a very error-prone endeavor. Any time the name or the position of an HTML element under test changes, the test will break. Generally speaking, testing at the GUI level is notoriously brittle and should only be done when there is a strong chance that the GUI layout and element names will not change. It's almost always better to test the business logic underneath the GUI (assuming the application was designed to clearly separate the GUI logic from the business logic) via a tool such as FitNesse, which can simulate GUI actions without actually going through the GUI.

However, there certainly are cases when one simply cannot skip testing the GUI, and HttpUnit is a decent tool for achieving this goal in the case of a Web application. Let's continue our example and test the search functionality of the main amazon.com Web page. If we were part of a QA team at amazon.com, we would probably expect the HTML design team to hand us a document detailing the layout of the main HTML pages comprising the site and the names of their main elements (forms, frames, etc.) As it is, we need to hunt for this information ourselves by playing with the live site itself and carefully poring through the HTML source of the pages we want to test.

I said before that in HttpUnit we can also get the raw HTML output via response.getText(). The response variable is an instance of the HttpUnit WebResponse class, which offers many useful methods for dealing with HTML elements. We can obtain collections of forms, tables, links, images and other HTML elements, then iterate over them until we find the element we need to test. We can alternatively get a specific element directly from the response by calling methods such as getLinkWithID() or getTableWithID().

If we search for the word "form" inside the HTML page source on the main amazon.com Web page, we see that the search form is called "searchform". We can retrieve this form from the response variable via the getFormWithName() method:

We can also see from the HTML page source that the form has two input fields: a drop-down list of values called "url" and an entry field called "field-keywords". We will use the form's setParameter() method to fill both fields with our information: "Books" (which actually corresponds to the value "index=stripbooks:relevance-above") for the drop-down list and "Python" for the entry field:

At this point, search_response represents the HTML page containing the 3 most popular search results for "Python", followed by the first 10 of the total number of relevant results (370 results when I tried it).

The HTML source for this page looks confusing to say the least. It's composed of a myriad of tables, which can be eyeballed by this code:

>>> tables = search_response.getTables()
>>> print tables

Let's pretend we're only interested in the 3 most popular search results. If we look carefully through the output returned by print tables, we see that the first cell in the table containing the 3 most popular results is "1.". We can use this piece of information in retrieving the whole table via the getTableStartingWith() method:

From the output of print most_popular_table we also see that the second column in each row contains information about the book: title, authors, new price and used price. If we look at the live page on amazon.com, we notice that each title is actually a link. Let's say we want to test the link for each of the 3 top titles. We expect that by clicking on the link we will get back a page with details corresponding to the selected title.

For starters, let's test the first title, the one at row 0. We can retrieve its link by calling the getLinkWith() method of the search_response object, and passing to it the title of the book (which we need to retrieve from the contents of the cell in column 2 via a regular expression):

Note that we also tested that the title contains "Python". Although this test may fail, it's nevertheless a pretty sure bet that each of the 3 top selling books on Python will have the word "Python" somewhere in their title.

We can now simulate clicking on the link via the link object's click() method. We verify that we get back a non-empty page and also that the HTML title of the book detail page contains the title of the book:

We have 4 test statements which expect 1 as a result in the body of the loop. Since there are 3 rows to inspect, we should expect 12 1's to be printed.

I'll stop here with my example. In a real-life situation, you would want to test much more functionality, but this example should be sufficient to get you going with both HttpUnit and Jython.

Step 4: Use the doctest module to write functional tests

Using the Python doctest module, we can save the Jython interactive session conducted so far into a docstring inside a function that we can call for example test_amazon_search. We can put this function (with an empty body) inside a module called test_amazon.py:

Note that we need to keep in the docstring only those portions of the Jython interactive session which do not change from one test run to another. We can't put there things like print statements that reveal book or title specifics, since these specifics are almost guaranteed to change in the future. We want our test to serve as a functional regression test for the bare-bones search functionality of amazon.com.

An interesting note is that the doctest module is used here to conduct a black-box type of test, whereas traditionally it is used for unit testing.

To fully take advantage of the interactive Jython session in order to later include it in a doctest string, I used the "script" trick. On a Unix system, if you type script at a shell prompt, a file called typescript is generated which will contain everything you type afterwards. When you are done with your "script" session, type exit to go back to the normal shell operation. You can then copy and paste the lines saved in the file typescript. This is especially useful for large outputs which can sometimes make other lines scroll past the current window of the shell.

1. Porting Java code to Jython is a remarkably smooth and painless process. I ported the OnJava.com example to Jython and in the process got a 40% reduction in line count (you can find the original Java code here and the Jython code here). While doing this, I gleefully got rid of ugly Java idioms such as:

2. My one-to-one porting from Java to Jython used unittest, which naturally corresponds to the original jUnit code. However, when I started using Jython interactively in a shell session, I realized that doctest is the proper test framework to use in this case.

3. I wish Jython could keep up with CPython. For example, the doctest version shipped with Jython 2.1 does not have the testfile functionality which allows you to save the docstrings in separate text files and add free-flowing text.

4. HttpUnit offers limited Javascript support. This can be a problem in practice, since a large number of sites are heavy on Javascript. While trying to find a good example for this post, I tried a number of sites and had HttpUnit bomb when trying to either retrieve the main page or post via a search form (such sites include monster.com, hotjobs.com, freshmeat.net, sourceforge.net).

In conclusion, I think there is a real advantage in using Jython over Java in order to quickly prototype tests that use third-party Java libraries. The combination of Jython and doctest proves to be extremely "agile", since it simplifies the test code, it enhances its clarity, and it provides instantaneous feedback -- all eminently agile qualities.