Thursday, 23 September 2010

I meant to spend a good part of the day doing the long overdue Stackless 2.6.6 merge, but having forgotten my SSH key, I found myself at work on a public holiday with time set aside for it.. but with an extra obstacle in the way. So I moved onto another project -- looking into the reports that there are problems with my stackless socket module, and the xmlrpc standard library. I was however unable to reproduce the problems, and spent the time cleaning up the mechanisms that allow me to run the standard library unit tests against my module.

The bulk of these mechanisms are monkey-patchings of thread blocking operations. When any of these calls (time.sleep, select.select, ..) are made, they would block the Stackless scheduler running on that thread. So the general brute force solution is to make any of these calls simply block the current microthread, while the actual call is made in another thread. This is rather naive, but with a bit of special casing for hiding harmless errors caused by race conditions, for the purposes of running suites of unit tests it works out fine.

However, the more I work on something like this, the more uneasy I feel with the long held ideal of having a suite of these monkey-patching mechanisms which users can just install to make any use of the standard or third party libraries just work. It's not that it wouldn't just work for the most part, look at stackless socket, it has been a cobbled together mess for many years and it was more than enough for almost anyone who used it. It's that it is actually important for people writing code to know where the points their microthreads block are. Whether these are safe voluntary yields to the scheduler, where state may change before the microthread gets scheduled again and resumes. Or whether it is unsafe blocking of the thread, which may also block the mechanism which triggers the condition that would reawaken it.

With this in mind, now I am thinking of an alternate solution. Rather than making a library that monkey-patches all thread blocking calls into tasklet blocking calls, I am now leaning towards monkey-patching them to raise an exception. They would all be guarded against being called at all. Any use of them would have to be explicitly allowed. So you would for instance allow stackless socket to call select.select via asyncore.poll, but when you subsequently invoked xmlrpc_server.serve_forever it would error on its call to select.select.

At this stage, no farming off work is performed at all, guards would simply be added to applicable functions. And references to those functions would be tracked down, in order to ensure that there are no corner cases where potentially thread blocking callables escape into the wild. What I mean by this is where for instance threading imports time and stores a local reference to sleep before the guards are put in place. This is a case I had to fix today to get the xmlrpc unit tests to run against stackless socket.

Once this is implemented and tested, next steps might be a generic mechanism to farm guarded calls out to worker threads, and an embedded web server that summarises what blocking calls are being made and additional information that helps a programmer understand what is really going on in their Stackless application.

Right, time to start coding..

Edit: First draft is checked in. It has a minimal blacklist, and can currently only handle simple functions. Before moving onto the next steps listed above, I should make it handle more than that.