When there are failures in the Python unit tests in the Stackless build, a good step to tracking them down is to define STACKLESS_OFF and compile the Stackless source code into standard Python. If the tests fail there, then it might be a mismerge, or it might be that the tests fail in the standard Python branch that was merged from. After compiling and testing the standard Python branch and not seeing test failures there, that then points to a mismerge.

In this case, the same tests failed to pass in the standard Python branch. While these false negatives increase the time and effort required to do the merge, they are of course better than having mismerged. They can also be ignored, as the goal of Stackless is to be backwards compatible with standard Python...

The first Python unit test failure was in the distutils tests. It looks like the exported symbol prefix for module initialisation has changed from init to PyInit_, but a unit test had not been changed to generate a method with the new prefix.

Recompiling the merged source code with Stackless enabled showed an additional unit test failure in test_pickle.py. Stackless allows pickling of running code and in order to do this, it alters what gets pickled. In this case, the fix was simply pickling the problematic object, taking the result and and updating the unit test to include it. Of course, the case where Stackless is disabled still needs to be handled.

There was one failure in the Stackless unit tests, to do with pickling a recursive function. Editing the test and reconciling what comes out of pickletools.dis with the test failure stack trace made it clear what the problem was. It was dynamic behaviour in the unittest.py module, because there were self references in the recursive function being pickled, the test suite instance was being included with the pickled function. On unpickling, a __getattr__ hook in the _WritelnDecorator text UI test running support class was entering an infinite loop. The correct fix here was to remove the self reference to stop the test case instance being dragged into the pickle.

TortoiseSVN problems

It used to be that TortoiseSVN just worked. I'd go through the trauma of merging a Python branch and building an installer, and then I could rely on it to do its job. But these days, every time I do a merge, it seems that TortoiseSVN gets flakier and flakier.

When I went to commit the merged Stackless code into our py3k branch, TortoiseSVN complained about how a new file from the original Python branch was already present. I tried to do a "clean up" on the relevant directory, which did not work. I tried to exclude the relevant directory from the commit operation, which resulted in another unclear and apparently irrelevant error to do with recursive commits needing to be done a certain way. Eventually, what worked was reverting the directory and readding it.

With the merged changes committed, there was one more code change to do. The configure script is a unix script and is generated from another file, configure.in. It does not merge well, so the best approach is to not merge it and once all the other merges are checked in, fetch the branch on a Linux machine and rebuild configure there. Then I download the regenerated file to my local machine via the web, and commit it locally. Normally is as straightforward as it sounds and just works, but this time, TortoiseSVN complained about inconsistent line endings. The file has the subversion property of native line endings, and in the past it was just accepted, but not any more. What worked was loading the file up in a decent editor that allowed me to save it out in "unix" format.

Doing the release

I am not releasing Stackless Python 3.1 yet. In order to allow extensions to be compatible with Python builds, there is an official version of Visual Studio that needs to be used for both of these things. And I do not have access to an install of Visual Studio 2008 at this time. At some later stage, I might have the time and interest to download Visual C++ Express 2008, and build the relevant binaries. But I cannot guarantee that the express version of Visual Studio can build installers anyway, which might make it pointless.

Friday, 31 July 2009

In the last attempt, I managed to get a minimal amount of standard socket code making use of the functionality Stackless Python provides, working within my script. But if I am to make a proper replacement socket module, I need to be able to use it in the same way stacklesssocket.py is used. To this end, the use case I want to support is Andrew Dalke's "more naturally usable" example script.

As can be seen in the script, the following code was the way the replacement socket module was put in place:

import stacklesssocketsys.modules["socket"] = stacklesssocket

Back in Python 2.5 when it was being developed, the socket module was a lot simpler and the approach of making a complete replacement module was feasible. These days however, in later versions of Python 2 and also in Python 3, the socket module has become somewhat more complicated and the approach of leaving the standard version in place and substituting parts became practical. This lead to new code to substitute in that functionality:

import stacklesssocketstacklesssocket.install()

The goal

Andrew's example script, the supporting of which is my goal, contains the following code:

The most important aspect of this, is how it uses another higher level standard library module (urllib) that in turn uses the standard socket module. By using my replacement functionality in this way, I can verify how compatible and stable it might be.

The implementation

The original stacklesssocket.py module was based on asyncore, another framework that wraps select. Back when I wrote it, I was glad to have it there to use, but the fact is that adopting a framework brings in dependencies and abitrary abstraction. If I had the time and interest, I would rewrite stacklesssocket.py to use select directly.

At this time, I am developing for Stackless Python 2.6.2. Right off the bat, it was obvious that this new module was going to be somewhat simpler. There were several substitutions to the standard socket module for stacklesssocket.py, but in this case it looks like I just need to subsitute my own _realsocket object.

def install(): stdsocket._realsocket = socket

The first obstacle was my lack of a connect method on my replacement socket object. Checking out the available functions on MSDN, ConnectEx was the one which suited my overlapped IO needs. However, I was unable to find it exported from any Winsock related DLL. It turns out, you need to call WSAIoctl in order to get a pointer to it. I had been avoiding calling WSAIoctl as I assumed it was only used to make faster function calls, but as it is the gateway to functionality I need that is no longer something I can do.

The appropriate dwIoControlCode value to get a function pointer is SIO_GET_EXTENSION_FUNCTION_POINTER. The value of lpvInBuffer to indicate that the pointer to ConnectEx is the one I want, is the following GUID.

At this point I decided to take note of the fact that it is possible for operations that are performed with an overlapped object, may actually complete right away. However, my mistake was in assuming that if this happened, a completion packet wouldn't arrive through GetQueuedCompletionStatus. This led to crashes where I examined the channel attribute on arrived packets, where the overlapped object had already been garbage collected due to the exit of the function that started the operation.

In order to clean things up, I decided to store a reference to the overlapped object ina global variable and in addition move the channel reference I hold also outside and alongside of that overlapped object. I could have stored the socket handle in the overlapped object, but since CreateIOCompletionPort allows me to associate a completion key with the socket, I store it there instead.

The idea was that I could then ignore packets that arrived for earlier operations which had actually completed immediately. Of course, logic having gotten a result for an IO operation may try another which might this time block. This means that unwanted packets for the operations that had completed immediately would come in and assume they were for the followup blocking operation. Errors ensued. So each completed packet needs to be able to be matched up with the operation state stored for it.

One note is that accessing ovCompletedPtr.contents will error if the value is NULL, the safe way with ctypes to ascertain whether the value is NULL is to check if the boolean value is False.

Running Andrew's script at this point results in it exiting cleanly, but mid-execution. The reason for this is the condition stackless.runcount > 2. The theory is that this should keep the loop running as long as there are at least two tasklets in the scheduler, this is just plain wrong and ill thought out. The fact is that at any given time, at least for Andrew's script, the runcount is only likely to be one. The reason for this is that all the other tasklets doing IO will be outside of the scheduler, blocked on a channel. So the lowest possible valid value for runcount that will require continued looping will be one.

Taking this into account, I added a weakref dictionary to track the socket objects which were live and added it as an additional reason to loop. This in no way addresses the above problem, but it does the trick. Andrew's script now runs, and gives the following output:

Of course the module in no way provides the full functionality of the standard version yet. Trivial aspects no-one in their right mind should be using, like UDP and so forth, are completely unsupported.

Yesterday, the pork was turned and had more of the salt/sugar mix applied again. Today, it had sat long enough and it was time to smoke it.

The first step was to wash off the sugar. The idea is that the sugar coating, if left on the pork, would cause it to go off sooner. That, and it probably would not taste as nice. So both pieces of pork were washed in a bucket of warm water and scrubbed to remove the sugar. Then they were hung outside to drip dry, so that they wouldn't drip inside the garage when they were hung there.

After having hung outside for around ten minutes, they were brought inside and hung in front of a large fan to be blown dry. The smaller piece turned in the wind, but the larger piece didn't and had to be hand turned after several hours. After several more hours, they were left to hang with the fan turned off overnight.

The next morning, being dry, they were ready to be taken down.

They were both then hung in the smoker and left for several hours.

After which, they were done.

And stored away in the fridge to sit for a few days, before being tried.

The goal of this step is to implement a partial amount of a Stackless-compatible socket module that provides the basic functionality the standard one has. It should be usable to provide blocking socket IO within Stackless tasklets.

The most straightforward approach is to base the code on how it was done with stacklesssocket.py. In stacklesssocket.py, a tasklet is launched to poll asyncore.

managerRunning = False

def ManageSockets(): global managerRunning

while len(asyncore.socket_map): # Check the sockets for activity. asyncore.poll(0.05) # Yield to give other tasklets a chance to be scheduled. stackless.schedule()

managerRunning = False

There I was looping and polling asyncore. I can easily substitute in the polling I already have built up with GetQueuedCompletionStatus.

As in the asyncore polling, I regularly call stackless.schedule() in order to yield to the scheduler and allow any other tasklets that might be present within it to get a chance to execute.

The custom attribute channel in the OVERLAPPED structure is expected to have had a stackless.channel instance assigned to it. In the case of an error, I can wake up the tasklet that initiated the asynchronous IO and raise an exception on it. In the case of success, I can return important information to the logic it invoked to trigger that IO, so that it can more easily handle the success case. The only piece of information I can see being of use is numberOfBytes, as that saves me from having to query it manually using WSAGetOverlappedResult.

Now I need to make a socket class that is compatible with the standard Python one.

This was a pretty direct conversion. The only additions were the setting of the channel attribute, blocking on the channel until we get returned the number of bytes sent or an except raised through it and returning the number of bytes sent to the caller.

This was also straightforward. Again, a channel was provided and blocked upon. A received data buffer is created on the first call and reused on subsequent calls, and the amount of data that is allowed to be received is capped to the size of that buffer. On success and a returned number of bytes received, the appropriate segment of the data buffer is sliced and returned to the caller.

In the event that the remote connection disconnected, the channel would have indicated that 0 bytes were received and this would have resulted in an empty string being returned. This is fine, as it is the right way to indicate a socket has disconnected on a recv call.

This was refactored in much the same way as the other asynchronous methods (send and recv), with a channel to block for the result upon. However it got a little more complex when it came to providing the expected address part of the return value. When I connect to a standard Python socket listening on my local machine, with a telnet connection from the same machine, I see the host as the standard localhost address (127.0.0.1). However, this did not give the same result:

host = inet_ntoa(localSockaddr.sin_addr)

Instead it would give the address 0.0.0.0. So, I looked at how the low-level Python C source code did it, and saw it was calling getnameinfo. My next attempt based on that, was the following code:

As a foreign function interface solution, ctypes is extremely impressive and allows you to do pretty much anything you want. Union structures however, seem to be strangely hard to get right. With this change, now the in_addr field of sockaddr_in is working correctly, although it did not fix my host address problem.

This completes the basic range of socket functionality. A good next step from here is to write a simple echo server, with standard Python blocking socket usage within Stackless Python.

This works perfectly, however there is one problem. When control-c is pressed, the Python interpreter will be killed due to an access violation or some such thing.

A reasonable assumption would be that Stackless tasklets blocked on IOCP related actions are not being cleaned up properly. In order to make sure the required clean up happens, I wrapped the main polling loop with suitable logic.

# Any tasklets blocked on IO are killed silently. v.send_exception(TaskletExit)

Raising a real exception on the blocked tasklets, causes the interpreter to exit on the given tasklet with that exception. I want them to be killed silently, having cleaned up properly and for the exception on the main tasklet to be the one that causes the interpreter to exit and be displayed.

In order for the cleanup method to know about the channels in use and tasklets blocked on them, any method that started asynchronous IO had the following line added before its channel receive call.

At this point, I have a socket object that can be used with Stackless Python to do synchronous IO that is actually transparently doing asynchronous IO using IO completion ports behind the scenes. It still needs to be fleshed out into a module that can be monkey-patched in place of the standard socket module, as the stacklesssocket.py module can.

Monday, 27 July 2009

Didn't get around to doing this yesterday. I also didn't bother taking a picture today. As the curing liquid only covered half the pork, today I turned over the meat ensuring the piece that was on top was now on the bottom. Tomorrow, the liquid will be drained and the meat rerubbed with the salt/sugar mix.

Sunday, 26 July 2009

In order to ensure the meat has an appealing red colour, and is not a greyish one, we're injecting a liquid solution into the meat that gives it that colour and ensures that the curing process happens in the thicker parts of the meat.

Weighing an appropriate amount of the "Honey-Glo cure" powder.

The powder is then added to warm water, and stirred until it is dissolved. Then salt is repeatedly added and stirred until the water has 55% salinity.

This is the current state of the pork after two days of curing. The lighting is unfortunately a little dark. The accrued liquid was poured down the drain, as it was going to be replaced by the "Honey-glo Cure" liquid.

Another piece of pork turned up, in this case it is wild pork, hunted somewhere up in the hills.

The curing liquid is injected into both pieces of meat. The liquid was poured from the plastic jug into a butcher's injection doodad. Then the doodad was sealed and air pumped in to get the pressure up so that the liquid would get pumped out.

The meat is penetrated in a lot of places to ensure that the liquid is present throughout the inside of the meat. I didn't get a photo of it in action, but the injection spike has holes down the sides, meaning liquid gets squirted sideways in all directions inside the meat - not forward.

Injecting in one of the many other locations.

With both pieces of meat injected with the colouring liquid, the remaining liquid is poured over them in the container to sit for a day.

There's some plan to shuffle them round tomorrow, so they both get to soak in the liquid.

The goal this time is to write an echo server, which will drive the addition of handling receiving data from connected sockets and a more general polling loop.

In order to keep track of the current accept socket and any other connected sockets, I need a dictionary to store the relevant state. The intention is that the key will be the completion key and the value will consist of an arbitrary number of elements, the first two of which are the current state ID and then the socket.

stateByKey = {}

I will only be looking up values for unknown completion keys in the dictionary, for two states, reading and writing.

STATE_WRITING = 1STATE_READING = 2

I can identify incoming connections being assigned to the accept socket by the LISTEN_COMPLETION_KEY. I create an initial accept socket ready for use when the polling starts.

stateByKey[LISTEN_COMPLETION_KEY] = CreateAcceptSocket()

After which the main loop of the script can pump the polling for completed packets.

The Pump function needs to be able to start an overlapped WSARecv operation on a newly connected socket. When it receives the completed packet for that operation, it needs to start an overlapped WSASend operation to send back what it received on the same socket. It also needs to start another overlapped WSARecv operation when the wrote completes so it can repeat the process.

With regard to the overlapped WSASend operation, I've already defined and used that function in a previous post, so I can fetch and repurpose the relevant logic.

The idea is that the function gets told the socket to send to, what message to send and returns anything that needs to be stored in the state dictionary. The expected return values of the state ID and socket are there, but the reference to the overlapped object is also returned and stored so that it does not get garbage collected as mentioned in the previous post.

Using it is almost identical to the use of WSASend. The only difference is that I need a buffer for incoming data to be placed into, and I need to return that buffer so that I can see what arrived when the completed packet arrives.

The special case outside of the echo handling of connected sockets, is the initial handling of newly connected sockets. Here I start an overlapped read operation on the newly connected socket and create a new accept socket for the next incoming connection.

When a write operation completes a new read operation is started for the socket.

if stateByKey[completionKey.value][0] == STATE_WRITING: # We'll use the completion of the write to start the next read. stateByKey[completionKey.value] = StartOverlappedRead(stateByKey[completionKey.value][1])

And when a read operation completes, I see how many bytes were read, and write that many bytes from the stored read buffer out to the relevant socket. If 0 bytes were read, this signals that the socket has been disconnected, and it can be forgotten about.

numberOfBytes is an output from the call to GetQueuedCompletionStatus which would be above this code.

if stateByKey[completionKey.value][0] == STATE_READING: # We'll use the completion of the read to do the corresponding write. _socket, recvBuffer, ovRecv = stateByKey[completionKey.value][1:]

state = stateData[0] if state == STATE_WRITING: # We'll use the completion of the write to start the next read. stateByKey[completionKey.value] = StartOverlappedRead(stateData[1]) elif state == STATE_READING: # We'll use the completion of the read to do the corresponding write. _socket, recvBuffer, ovRecv = stateData[1:]

And at this point the script implements a functional echo server, handling disconnected sockets as intended.

There are a few things that could be polished, for instance there's no need to allocate new buffers or overlapped objects for each socket for every operation, these could easily be reused. But that's not that so important.

The tentative long term goal is to build up a base of more or less reasonable code that covers the required range of IO completion port based networking support, in order to potentially write a new stacklesssocket.py module.