About Me

Thursday, April 28, 2005

I discovered the hard way that, even with stat_float_times enabled, mtime does not have a very great resolution. About a millisecond on my machine (actually ever so slightly worse, but it seems to vary with CPU clock speed). This would be fine, if I were not relying on being able to compare times retrieved from the mtime field of file metadata and times retrieved from gettimeofday(2) with each other. Too bad for me.

What this ended up leading to was this fun sequence:

x = time.time()y = modifyAFileAndReturnMTime()z = time.time()

assert x <= y <= z # Woops, this fails

Woe is me. So after a few hours of figuring that it was mtime resolution that was getting me, I figured I'd just cludge it and stuff time.time() into the mtime field at the crucial moment. This works because the field itself is capable of precision down to the nanosecond level, the operating system just doesn't keep track of it that carefully. Having replaced the actual mtime with my app's concept of the mtime, all my times will sort correctly and I won't accidentally toss files out the window.

So how do I do this? Well, utimes(2) of course. Heck, it's even wrapped in Python:

Okay, so Python's screwed up :( A little digging later, I see that actually maybe it's just Ubuntu's build of Python. HAVE_UTIMES isn't being defined, so all my nice double precision floating goodness is being dropped on the floor and only the integer parts carried over. Time to file a bug report...

Unfortunately, having these times be accurate is somewhat important to the application. I think maybe I can fudge it by either adding or subtracting a few seconds from each mtime (I'm not sure which yet... hopefully adding, because subtraction is hard). At worst this should mean I copy a few extra files... I hope. More thought is required.

Saturday, April 9, 2005

I wrote most of a new smtp client from scratch tutorial for Twisted a couple weekends ago. I wanted to polish it up some more before asking for comments but it doesn't look like I'm going to get to it as soon as I'd hoped. It is in the style of the Twisted from scratch tutorial, though quite a bit shorter. I still need to write an introduction and a conclusion, but I'd like to hear from people if the doc is clear and useful, and if not what I can do to improve it. I hope to be writing some more of these for various parts of Twisted in the near future, if people do indeed find them useful.

Tuesday, April 5, 2005

While some people are busy worrying about how to make Python's builtin sockets less efficient, one might be wondering if the reverse is possible - how do you make them more efficient? After all, you usally want your program to run more quickly, or tax your CPU less heavily, or consume fewer resources, not the reverse. Fortunately, I have just the solution for you1. The approach explored below will be to avoid allocating new memory when reading from the socket. Since malloc() is a relatively expensive operation, this will save us a bunch of CPU time, as well as saving us memory by reducing chances for heap fragmentation and so forth.

As you can see, the handy readinto method of file objects can be used to provide a pre-allocated memory space for a read to use. Unfortunately, it is a file method, not a socket method (also, its documentation recommends strongly against its use, though I can't imagine why!). We can get around this, though, since a file descriptor is just a file descriptor. os.fdopen will happily give us a file object wrapped around the socket we're really interested in. Then it's a simple matter of calling readinto on the resulting file object with an array we have previously allocated.

"Great!" you say. "Why even bother with the other two examples?" you wonder. Well, there are a few problems. Even if we accept the os.fdopen hack, and even if we do not let the strong words in the file.readinto docstring dissuade us, there's still a tiny problem. file.readinto closes the file descriptor before returning! Damn, there goes our socket. Maybe the next solution will fare better.

Solution the second: recv(2)Okay, that stuff with file.readinto was just silly. Let's get serious here. libc already provides the functionality we need here, and has for decades. This is basic BSD sockets 101. Stevens would cry (if he were still with us) if he saw us doing anything else. So let's cut the funny business and just do what a C programmer would do: call recv.

Sweet. We open libc so we can call recv in it, create a socket as usual, and another array object to act as our pre-allocated memory location. Note we use the buffer_info method this time, because recv() does not expect a "read-write buffer object" (like file.readinto did), but a pointer to a location in memory, which is exactly what buffer_info()[0] gives us. Then we just call recv. Easy as eatin' pancakes. We can even do it twice, demonstrating that recv isn't doing anything ridiculous, like closing the socket for us (I did it with the same array object, overwriting the previous contents, demonstrating that our no-allocation trick is working just fine).

I know what you're thinking, though. array objects? What the hell can you do with an array object? Well, here's what. All kinds of stuff! Why, you can build one from a string. Or build a string from one. Or, uh, swap the byte order... umm, oh yea you can reverse them too. Cool deal, eh? Err, no, maybe not actually... None of those cool string methods are around, unfortunately. You can create a string from the array but that kind of defeats the purpose... in doing so you've just allocated a pile of memory. Nuts. Well, wait, don't give up yet, we may be able to improve upon this situation...

Solution the ultimate: recv(2) (uh yea, again).

The only problem we really have with recv isn't actually with recv: it's with array! Let's not throw the baby out with the bathwater, then. Solution: drop array, keep recv. We want a string. Well, let's use a string.

It's the perfect solution. No wasted memory allocation, but the same level of convenience as a normal call to socket.recv. Rarely are we lucky enough to find such elegant and flawless solutions in computer science. The astute reader might object to the magical 20 in the recv call as being inelegant or flawed, however the value can easily be computed at runtime. The code to do so is extremely simple and only omitted because it slightly too large to fit in the margin.