Public service announcement: signals implies reentrant code even in Python

This is a tiny PSA prompted by my digging into a deadlock condition in the Launchpad application servers.

We were observing a small number of servers stopping cold when we did log rotation, with no particularly rhyme or reason.

tl;dr: do not call any non-reentrant code from a Python signal handler. This includes the signal handler itself, queueing tools, multiprocessing, anything with locks (including RLock).

Tracking this down I found we were using an RLock from within the signal handler (via a library…) – so I filed a bug upstream: http://bugs.python.org/issue13697

Some quick background: when a signal is received by Python, the VM sets a status flag saying that signal X has been received and returns. The next chance that thread 0 gets to run bytecode, (and its always thread 0) the signal handler in Python itself runs. For builtin handlers this is pretty safe – e.g. for SIGINT a KeyboardInterrupt is raised. For custom signal handlers, the current frame is pushed and a new stack frame created, which is used to execute the signal handler.

Now this means that the previous frame has been interrupted without regard for your code: it might be part way through evaluating a multi-condition if statement, or between receiving the result of a function and storing it in a variable. Its just suspended.

If the code you call somehow ends up calling that suspended function (or other methods on the same object, or variations on this theme), there is no guarantee about the state of the object; it becomes very hard to reason about.

Consider, for instance, a writelines() call, which you might think is safe. If the internal implementation is ‘for line in lines: foo.write(line)’, then a signal handler which also calls writelines, could have what it outputs appear between any two of the lines in writelines.

True reentrancy is a step up from multithreading in terms of nastiness, primarily because guarding against it is very hard: a non-reentrant lock around the area needing guarding will force either a deadlock, or an exception from your reentered code; a reentrant lock around it will provide no protection. Both of these things apply because the reentering occurs within the same thread – kindof like a generator but without any control or influence on what happens.

Safe things to do are:

Calling code which is threadsafe and only other threads will be concurrently calling.

Performing ‘atomic’ (any C function is atomic as far as signal handling in Python is concerned) operations such as list.append, or ‘foo = 1′. (Note the use of a constant: anything obtained by reading is able to be subject to reentrancy races [unless you take care ])

In Launchpad’s case, we will be setting a flag variable unconditionally from the signal handler, and the next log write that occurs will lock out other writers, consult the flag, and if needed do a rotation, resetting the flag. Writes after the rotation signal, which don’t see the new flag, would be ok. This is the only possible race, if a write to the variable isn’t seen by an in-progress or other-thread log write.

While some folk look down on fakeraid (that is BIOS based RAID-until-OS-takes-over) solutions, I think they are pretty neat: they let a user get many of the benefits of dedicated controller cards at a fraction of the cost. The benefits include the usual ones for RAID – more spindles to handle IO, tolerance of disk failures. And unlike pure LVM solutions, you can boot from a degraded RAID 1 / 5 / 10 set because the BIOS knows how.

In some ways this is better than dedicated cards, because we have the software take over, so we can change the algorithms for IO dispatch all the way down to the individual devices

However, these RAID volumes are in a pretty awkward spot for installers and bootloaders: inside a running Linux environment they look like software RAID which cannot be depended on for booting, but at boot time they look like hard disks which cannot be looked under the hood.

I recently got a new desktop machine which has one of these motherboards, and fortuitously my old desktop I was replacing had the same size disks – so I had 4 disks and the option of using a RAID setup. Apparently I’m a sucker for punishment because I went for a RAID 10 (that is two RAID volumes made up of two-disk mirrors (the RAID 1 component), and then those two volumes are combined via striping (the RAID 0 component). This has the potential for pretty nice performance: in principle any read can come from one of 2 disks, and every 64KB (the stripe size) of linear data will switch to the other mirror set, giving a nice boost. Writes need to write to 2 disks always, but every 64KB worth of data will alternate mirror sets, also giving a boost.

Sadly we (Ubuntu) aren’t ready for this yet: there are two key bugs that make this layout almost impossible to install into. This blog post is for my exo-memory, I want to be able to figure out what I did next time around .

Firstly parted_devices, a helper used by Ubiquity and debian-installer to determine which block devices are actually disk drives that one can partition and install onto, has a confused heuristic – when dealing with dmraid it looks for devices which are not layered on other dmraid devices. This handily excludes partitions, but has the undesirable effect of excluding that striped device – because it is layered on the two mirrored devices. Bug 560748 was filed about that, and I’ve added a workaround to it – basically disabling the filtering, so its not suitable as a long term fix, but it will let one select the RAID volume correctly.

Secondly, grub2, which needs to figure out what the name at boot time of the RAID volume will be currently gets confused. I don’t know enough to really explain – and be correct in my explanation – but I do have a fugly patch which worked for me. Bug 803658 tracks this defect. The basic approach I took was to say that dmraid devices should be an abstraction layer we don’t peek under: if it claims to be a disk, well then its a disk. As grub does actually work that way - it talks to INT 13h – the BIOS support for booting off of the RAID volume is entirely sufficient.

Sadly neither bug is at the point where the patches can be rolled into Ubuntu itself, but the workaround should let folk get up and running.

In both cases, build the package locally in the installer, install it, then after than run ubiquity and things should install.

After the install, you will need to reapply the patch in the resulting installed environment, or things like update-grub will die on you!

(huge thanks to cjwatson and ev for giving me some tips while I investigated this)

Ok, so micro rant time: this is the effect of not taking things upstream: hardware doesn’t work Out Of The Box.

Very briefly, I purchased a Vodafone prepaid mobile broadband package today, which comes with a modem and SIM. The modem is a K3571-Z, and Ubuntu *thinks* it knows how they work (it doesn’t). So it fails to connect in NetworkManager with a rather opaque ‘NO CARRIER’ message.

Thanks to excellent assistance from ﻿﻿Matt Trudel, we tracked this down to a theory that perhaps modemmanager is using the wrong serial port – and voila, it is. From there, the config file (/lib/udev/rules.d/77-mm-zte-port-types.rules) was an obvious next step – and indeed there is no entry in there for the 19d2:1010 – the K3571-Z. Google found one immediately though, on a Vodafone research site.

The awful shame is this: that was committed to the bcm project in March this year. If Vodafone had shipped off a patch to modemmanager, we could have had that in 10.10, and possibly even in 10.04. There are plenty of users having trouble on Whirlpool etc with this model who would have had a better experience – helping Vodafone’s users be happier.

All it would have taken is an email

I’m sure Vodafone want a great experience for their users, but I think they’re failing to separate out platform improvements – share and share alike, and branding / custom facilities. The net impact is harmful, not helpful.

Tesetrepository has a really nice workflow for fixing a set of failing tests:

Tell it about the failing tests (e.g. by doing a full test run, or running a single known failing test)

Run just the known failing tests (testr run –failing)

Make a change

Goto step 2

As you fix up the tests testr will just give your test runner a smaller and smaller list of tests to run.

However I haven’t been able to use that feature when developing (most) Python programs.

Today though, I added the necessary support to testtools, and as a result subunit (which inherits its thin test runner shim from testtools) now supports –load-list. With this a simple .testr.conf can support this lovely workflow. This is the one used in testrepository itself: it runs the testrepository tests, which are regular unittest tests, using subunit.run – this gives it subunit output, and tells testrepository how to run a subset of tests.
[DEFAULT]
test_command=python -m subunit.run $IDOPTION testrepository.tests.test_suite
test_id_option=--load-list $IDFILE

So a while back I blogged about maintainable test suites. One of the things I’ve been doing since is fiddling with the heart of the fixtures concept.

To refresh your memory, I’m defining fixture as some basic state you want to reach as part of doing a test. For instance, when you’ve mocked out 2 system calls in preparation for some test code – that represent a state you want to reach. When you’ve loaded sample data into a database before running the actual code you want to make assertions about – that also represents a state you want to reach. So does simply combining three or four objects so you can run some code.

Now, there are existing frameworks in python for this sort of thing. testresources and testscenarios both go some way towards this (and I and to blame for them ), so does the zope testrunner with layers, and the testfixtures project has some lovely stuff as well. And this is without even mentioning py.test!

There are a few things that you need from the point of view of running a test and establishing this state:

You need to be able to describe the state (e.g. using python code) that you wish to achieve.

The test framework needs to be able to put that state into place when running the test. (And not before because that might interfere with other tests)

And the state needs to be able to be cleaned up.

Large test suites or test suites dealing with various sorts of external facilities will also often want to optimise this process and put the same state into place for many tests. The (and I’m not exaggerating) terrible setUpClass and setUpModule and other similar helpers are often abused for this.

Why are they terrible? They are terrible because they are fragile; there is no (defined in the contract) way to check that the state is valid for the next test, and its common to see false passes and false failures in tests using setUpClass and similar.

So we also need some way to reuse such expensive things while still having a way to check that test isolation hasn’t been compromised.

Having looked around, I’ve come to the conclusion we’ll all benefit if there is a single core protocol for doing these things, something that can be used and built on in many different ways for many different purposes. There was nothing (that I found) that actually met all these requires and was also tasteful enough that folk might really like using it.

I give you ‘fixtures‘. Or on Launchpad. This small API is intended to be a common contract that all sorts of different higher level test libraries can build on. As such it has little to no policy or syntatic sugar.

It does have a nice core, integration with pyunit.TestCase, and I’m going to add a library of useful generic fixtures (like temporary directories, environment isolators and so on) to it. I’d be delighted to add more committers to the project, and intend to have it be both Python 2.x and 3.x compatible (if its not already – my CI machine isn’t back online after the move yet, I’m short of round tuits).

I do hope that the major frameworks (nose, py.test, unittest2, twisted) will include the useFixture glue themselves shortly; I will offer it as a patch to the code after giving it some time to settle. Further possibilities include declared fixtures for tests, and we should be able to make setUpClass better by letting fixtures installed during it get reset between tests.

I recently moved withing Canonical from being a paid developer of Bazaar to take on a larger challenge Technical Architect for Launchpad. Its been two months now, and its time to put my head up out of the coal face, have a look around and regroup.

When I worked on Bazaar, every day when I started work got up I was working on a tool anyone can use, designed for collaboration upon sourcecode, for people writing software. This is a toolchain component right at the heart of the free software world. Bazaar and tools like it get used everyday to manage, distribute and collaborate on the sourcecode that makes up the components of Ubuntu, Debian, Fedora and so forth. Every time someone new starts using Bazaar for a new free or open source project, well I felt happy – happy that in my small part I’m helping with this revolution we’re carrying out.

Launchpad is pretty similar to Bazaar in some ways. Obviously they are both free software, both are written in Python, and both are sponsored by Canonical, my employer. And they both are designed to assist in collaboration and communication between free software developers – albeit in rather different ways.

Bazaar is a tool anyone can install locally, run as a command line, GUI, or local webserver, and share code either centrally (e.g. by pushing to Launchpad), or in a peer to peer fashion, acting as their own server.

Launchpad, by contrast is a website which (usually) folk will use as a service – in their browser, from the comand line – FTP (for package building), ssh (for Bazaar branch pushing or pulling), or even local GUI programs using the Launchpad API service. This makes it more approachable for first time collaborators, but its less able to be used offline, and it has all the usual caveats of web sites : it needs a username and password, it’s availability depends on the operators – on the team I’m part of. So there’s a lot less room for error: if we do something wrong, the system is unavailable, and users can’t just ‘apt-get install’ an older release.

With Launchpad our goal is to to get all the infrastructure that open source need out of the way, so that they can focus on their code, collaboration within their team – and almost uniquely – collaboration with other teams. As well as being open source, Launchpad is free for all open source projects to use – Ubuntu is our single biggest user – they use it for all bugtracking, translation and package building, and have a hugefraction of the total storage overhead in the database.

Launchpad is a pretty nice system, so people use it, and as a result (on a technical basis) it is suffering from its own success: small corner cases in the code turn up every day or two, code written years ago to deal with a relatively small data set now has to deal with data sets a thousand or more times larger (one table, for instance, has over 600,000,000 rows in it.

For the last two months then, I’ve been working on Launchpad. As Technical Architect, I need to ensure that the things that we (users, stakeholders and developers of Launchpad) want to do are supported by the structure of the system : the platform(s) we’re building on, the way we approach problems, coding standards and diagnostic tools. That sounds pretty dry and hands off, but I’m finding its actually very balanced. I wrote a presentation when I started the job, which encapsulated the challenges I saw in front of the team on this purely technical front, and what I thought I needed to do.

I think I was about right in my expectations: On a typical day, I’ll be hands on in a problem helping get it diagnosed, talking long term structural changes with someone around how to make things more efficient / flexible / maintainable, and writing a small patch here or there to help move things along.

In the two months since I took on this challenge, we’ve made significant headway on the problem of performance for Launchpad : many inefficient code paths have been identified and removed, some new infrastructure has been created as is being rolled out to make individual pages faster, and we’ve massively increased the diagnostic data we get when things go wrong. We’ve introduced facilities for responding more rapidly to issues in the software (but they have to be rolled out across the system) and I hope, over the next 4 months we’ll reach the first of my performance goals: for any webpage in Launchpad, it will complete rendering in 99% of the time. (Note that we already meet this goal if you measure the whole system, but this is biased by some pages being very frequently hit and also being very small).

In their post the author notes that subunit is not easy_installable at the moment. It will be shortly. Thanks to Tres Seaver there is a setup.py for the python component of Subunit, and he has offered to maintain that going forward. His patch is in trunk, and the next release will include a pypi upload.

The next subunit release should be pretty soon too – the unicode support in testtools has been overhauled thanks to Martin[gz], and so we’re in much better shape on Python 2.x than we were before. Python3 for testtools is trouble free in this area because confused strings don’t exist there

There’s a test code maintenance issue I’ve been grappling with, and watching others grapple with for a while now. I’ve blogged about some infrastructural things related to it before, but now I think its time to talk about the problem itself. The problem shows up as soon as you start writing setUp functions, or custom assertThing functions. And the problem is – where do you put this code?

If you have a single TestCase, its easy. But as soon as you have two test classes it becomes more difficult. If you choose either class, the other class cannot use your setUp or assertion code. If you create a base class for your tests and put the code there you end up with a huge base class, and every test paying the total overhead of your test needs, rather than just the overhead needed to test the particular system you want to test. Or with a large and growing list of assertions most of which are irrelevant for most tests.

The reason the choices have to be made is because test code is just code; and all the normal issues there – separation of concerns, composition often being better than inheritance, do-one-thing-well – all apply to our test code. These issues are exacerbated by pyunit (that is the Python ‘unittest’ module included with the standard library and extended by various projects)

Lets look some (some) of the concerns involved in a test environment: Test execution, fixture management, outcome decision making. I’m using slightly abstract terms here because I don’t want to bind the discussion down to an existing implementation. However the down side is that I need to define these terms a little.

Test execution – by this I mean the basic machinery of running a single test: the test framework calling into user code and receiving back an outcome with details. E.g. in pyunit your test_method() code is called, success is determined by it returning successfully, and other outcomes by raising specific exceptions. Other languages without exceptions might do this returning an outcome object, or passing some object into the user code to be called by the test.

Fixture management – the non trivial code that prepares a situation where you can make assertions. On the small side, creating a few object instances and glueing them together, on the large end, loading data into a database (and creating the database instance at the same time). Isolation issues such as masking out environment variables and creating temp directories are included in this category in my opinion.

Outcome decision making – possibly the most obtuse label I’ve ever given this, I’m referring the process of deciding *what* outcome you wish to have happen. This takes different forms depending on your testing framework. For instance, in Python’s doctest:

>>> x

45

provides a specification – the test framework calls str(x) and then compares that to the string ’45′. In pyunit assertions are typically used:

self.assertEqual(45, x)

Will call 45 == x and if the result is not True, raise an exception indicating a Failure has occured. Unexpected exceptions cause Errors, and in the most recent pyunit, and some extensions, other exceptions can signal that a test should not be run, or should have failed.

So, those are the three concerns that we have when testing; where should each be expressed (in pyunit)? Pragmatically the test execution code is the hardest to separate out: Its partly outside of ‘user control’, in that the contract is with the test framework. So lets start by saying that this core facility, which we should very rarely need to change, should be in TestCase.

That leaves fixture management and outcome decision making. Lets tackle decision making… if you consider the earlier doctest and assertion examples, I think its fairly clear that there are multiple discrete components at play. Two in particular I’d like to highlight are: matching and signalling. In the doctest example the matching is done by string matching – the reference object(s) are stringified and compared to an example the test writer provides. In the pyunit example the matching is done by the __eq__ protocol. The signalling in the doctest example is done inside the test framework (so we don’t see any evidence of it at all). In the pyunit example the signalling is done by the assertion method calling self.fail(), that being the defined contract for causing a failure. Now for a more complex example: testing a float. In doctest:

>>> “%0.3f” % x

0.123

In pyunit:

self.assertAlmostEqual(0.123, x, places=3)

This very simple check – that a floating point number is effectively 0.123 exposes two problems immediately. The first, in doctest, is that literal string comparisons are extremely limited. A regex or other language would be much more powerful (and there are some extensions to doctest; the point remains though – the … operator is not enough). The second problem is in pyunit. It is that the contract of assertEqual and assertAlmostEqual are different: you cannot substitute one in where the other was expected without partial function application – something that while powerful is not the most obvious thing to reach for, or to read in code. The JUnit folk came up with a nice way to address this: they decoupled /matching/ and /deciding/ with a new assertion called ‘assertThat’ and a language for matching – expressed as classes. The initial matcher library, hamcrest, is pretty ugly in Python; I don’t use it because it tries too hard to be ‘english like’ rather than being honest about being code. (Aside, what would ‘is_()’ in a python library mean to you? Unless you’ve read the hamcrest code, or are not a Python programmer, you’ll probably get it wrong. However the concept is totally sound. So, ‘outcome decision making’ should be done by using a matching language totally seperate from testing, and a small bit of glue for your test framework. In ‘testtools’ that glue is ‘assertThat’, and the matching language is a narrow Matcher contract (in testtools.matchers) which I’m going to describe here, in case you cannot or don’t want to use the testtools one.

This permits composition and inheritance within your matching code in a pretty clean way. Using == only permits this if you can simultaneously define an __eq__ for your objects that matches with arbitrarily sensitivity (e.g. you might not want to be examining the process_id value for a process a test ran, but do want to check other fields).

Now for fixture management. This one is pretty simple really: stop using setUp (and other similar on-TestCase methods). If you use them, you will end up with a hierarchy like this:

Note that there are some things around that offer this sort of convention already: thats all it is – convention. Pick one, and run with it. But please don’t use setUp; it was a conflated idea in the first place and is a concrete problem. Something like testresources or testscenarios may fit your needs – if it does, great! However they are not the last word – they aren’t convenient enough to replace just calling a simple helper like I’ve presented here.

To conclude, the short story is:

use assertThat and have a seperate hierarchy of composable matchers

use or create a fixture/resouce framework rather than setUp/tearDown

any old TestCase that has the outcomes you want should do at this point (but I love testtools).