Py6S now has Continuous Integration & better tests

March 3, 2014

As a Fellow of the Software Sustainability Institute I’m always trying to make my software more sustainable – and one element of this is ensuring that my software works correctly. Although crashes might annoy users (which generally isn’t a good plan if you want your software to be well-used), a far worse problem is your software producing subtly-incorrect results – which may not be noticed until papers have been published, sensors designed and large research projects started. Definitely not something I want to happen with Py6S!

So, for a while now I’ve been writing various tests for Py6S. You can find them all in the tests folder of the Py6S code, and they can be run by simply running nosetests in the root of the Py6S source code tree. Adding these tests was definitely an improvement, but there were two problems:

I kept forgetting to run the tests after I’d changed things. Then, just before a release I’d remember to run the tests (hopefully!) and find all sorts of problems and have to try and work out how I’d created them.

These tests were mostly regression tests. That means that I ran something in Py6S at the Python console, and then created a test to ensure that the same code would produce the same output that I’d just got. This is useful – as it protects against ‘regressions’, where changes to one part of the code also break things elsewhere – but it doesn’t test that Py6S itself actually produces the right answers. After all, it might have been wrong all along, and a regression test wouldn’t pick that up!

So, I decided to have a big push on testing Py6S and try and fix both of these problems.

Firstly, I set up a Continuous Integration server called Jenkins on my nice shiny Rackspace cloud server. Continuous Integration tools like this are often used to compile software after every commit to the source control system, to ensure that there aren’t any compiler errors that stop everything working – and then to run the test suite on the software. Of course, as Py6S is written in Python it doesn’t need compiling – but using Jenkins is a good way to ensure that the tests are run every time a modification to the code is committed. So now I simply alter the code, commit it, push to Github and Jenkins will automatically run all of the tests and send me an email if anything has broken. Jenkins even provides a public status page that shows that the Py6S build is currently passing all of the tests, and even provides graphs of test failures over time (shown below – hopefully automatically updating from the Jenkins server) and test coverage.

Using Jenkins to provide test coverage reports (which show which lines of code were executed during the tests, and therefore which lines of code haven’t been tested at all) showed me that quite a lot of important bits of Py6S weren’t being tested at all (even with regression tests), and of course I still had the problem that the majority of my tests were just regression tests.

I wasn’t sure what to do about this, as I couldn’t replicate all that 6S does by hand and check that it is giving the right results (even if I knew enough to do this, it’d be very time-consuming to do), so how could I do anything other than regression tests? Suddenly, the answer came to me: replicate the examples! The underlying 6S model comes with example input files, and the expected outputs for those input files. All I needed to do was to implement Py6S code to replicate the same parameterisation as used in the input files, and check that the outputs were the same. Of course, I wouldn’t do this for every parameter in the output files (again, that’d take a long time to manually setup all of the tests – although it may be worth doing sometime) – but a few of the most-used parameters should give me a high confidence that Py6S is giving the same results as 6S.

So, that’s what I did. The code is available on Github, and all of these example-based tests pass (well, they do now – as part of writing the tests I found various bugs in Py6S which I fixed).

Overall, I have a far higher confidence now that Py6S is producing correct results, and using Continuous Integration through Jenkins means that I get notified by email as soon as anything breaks.

One Comment

What you’ve labelled as regression testing is actually characterization testing. Regression testing just seeks to uncover new software bugs, or regressions, in existing areas of a system after changes. The tests can be example based tests in regression testing.