mu's views on program and recipe! design

Links

I recently decided to try out Epiphany for real after the umpteenth time Firefox went haywire when my gtk theme changed. As a base browser, the only things it has going for it are the ability to drag tabs around, and the tag-based bookmarks system.

On the downside, using that tag-based bookmarks system means starting over when you have hundreds of poorly categorized bookmarks like I did. I'm still not sure if I like it, but it's definitely neat.

On the plus side, Epiphany can be tweaked with Python extensions. In particular Stefan Stuhr's Only One Close Button is a blessing to those of us who don't like scrolling through tabs.

With this in mind, I set out to fix a problem I've been experiencing. One of the sites I visit takes user submitted content, including a link and a comment. Lots of submissions include a link in the comment field. However in order to preserve their table layout without having it overflow, the site embeds U+200B Zero-Width-Space characters into the shown text. When I select the text and paste it to a new tab, the url will contain a '%E2%80%8B' substring (URL-escaped UTF-8 encoded U+200B) which invalidates the destination, and tends to yield either a 404 or a redirection.

I wrote up a simple extension based on the tutorial, but debugging it was a real pain. My initial session of epiphany was running somewhere with no stdout or stderr. My first draft of my extension had several errors which resulted in thrown exceptions. A bug with Epiphany causes it to fail to exit properly when Python extensions throw exceptions. It was not only failing to reload my extension, but when I thought I was running Epiphany in a terminal, it was really resuming the existing session. Let this be a lesson to Epiphany Python extension developers: be willing to kill your running Epiphany process!

After figuring that out with the help of jfr on #epiphany, I was then able to complete the extension handily. It's not ideal: it causes a second automatic page load to happen, and going back to the page with the zero width space in the URL will again trigger the redirect. But it's good enough for me. If anyone else wants it, either to use or to fix, you can get it from EpiphanyExtensions.

I love working in Python. When I was first coming from Perl everything felt weird. Now I can't imagine going back voluntarily. However there are a few laments that come up time and time again to trouble me, the Python using developer.

Cross-version compatibility

The developers of Python have a pretty rigorous backwards-compatible culture, but the actual implementation leaves something to be desired. While many modules, especially those outside of the standard library, but maintained by those who develop Python, are able to be compatible across many versions, the recent new Python 2.5 managed to create two completely separate incompatibilities in Quod Libet and mutagen. As Martin v. Löwis said, someone always notices. If only they had put as much thought into the changes that hit us as they seem to be putting into whether floating point +0.0 and -0.0 can be distinguished.

Quality of bindings

Every binding of a library implemented is implemented separately, and generally by different authors. This leads to python modules that feel wildly different, from different naming conventions, to support for pythonic methods. Support for pythonic methods range themselves from supports iteration to the right mixture of functions and objects with methods. Sometimes collections of similar modules group behind a common API such as the DB API.

When I first started using python, I happened to pick Pygame, a very easy to use binding. It was well documented. The parts you needed regularly were small enough to fit in your head. There were helpers in the right places. It accepted reasonable duck-typing in convenient places, such as accepting any 4-element list or tuple where a pygame.Rectangle was going to be used, implicitly upcasting it.

By contrast, recently I've tried to find a good binding for working with subversion, and have had poor luck. The initial binding, a poorly documented SWIG wrapper, was very tied to a low level C interface. Nobody using the python bindings would want to deal with the APR directly. More recently, pysvn seemed to address this. The functions seemed to expose the right level to let me get my work done.

My Ideal PySvn

After working with it for a couple days, however, I've changed my mind: pysvn still has a long way to go. I'm going to present a couple steps that would bring pysvn closer to my idealized version, shamelessly ignoring that implementing it would require many API-incompatible changes.

While trying write a simple subversion browser, there are two road blocks I keep bumping into.

Revisions are way too hard to use.

Dictionaries of attributes are filled with inconsistent names.

I'm not sure which of the above road blocks bothers me more. On the one hand, the Revision object reads like a straight translation of a C struct with a union inside it. This allows it to represent any of several kinds of revision. But I would contend that it should just accept simple python types.

Types head, base, working, committed, and previous should be represented with strings.

Type number should be represented as an int (or long)

Type date should be represented as a datetime, or if non-builtins are unacceptable, a (time) tuple.

Type unspecified should be represented as None

For simplicity, the constructor should accept these, and any function that requires them should run them through the constructor as necessary. For an attempt at bonus points the constructor could also accept strings like '137' or '1986-01-28 17:39:13.620000' in place of integers or time objects, but it's probably better not to accept those. Better yet, these objects should always be used, and the Revision object should be dropped from the Python interface.

On the other hand, inconsistent names in the properties dictionaries are really hard to keep straight. The info() call returns an Entry object, with attributes including commit_author commit_revision commit_time kind name revision url. The info2() call returns items with attributes including last_changed_author last_changed_rev last_changed_date kind rev URL. Note that the commit prefix became last_changed, the time suffix became date, revision became rev, url became URL, and name disappeared. The log() and ls() calls suffer similar transformations, although the amount of information each contains is smaller.

I'm willing to assume that the actual subversion API is similarly fickle, and that the pysvn authors didn't create this mess, but I really want it to be cleaned up. I'd like to get an object, say a subclass of a mythical PySvnItem, which knows its url and name and revision, and can look up various other items that may not have been returned by the corresponding C API call when you access the appropriate attribute so I don't have to worry about which call provided the original PySvnItem. And the names of these attributes should be normalized so I don't have to worry about last vs last_changed vs commit prefixes.

Bugs

No lament is complete without mention of at least the one latest bug that delayed some work. In this case it was pysvn's diff() method, which fails with a weird C++ error if you pass it revisions as positional arguments. Peter found that named arguments work fine, so it's no longer in my way.

So many things I forego writing about to fill up your feed, but here's one that I just have to. A while ago I worked on a program I was calling tansei. The eventual goal was to be a personal planet-style feed aggregator in an application instead of a browser window. But I got tired of it after some of the simplest bits of it fought me for too long. And as usual when I write software, I distracted myself with less important parts of it first - in this case trying to render all sorts of crappy html the way I wanted it.

So finally a few months ago Joe tells me liferea has a compact view which does exactly what I want...at least when it manages to pull the entries separately. And that means I no longer have to write the rest of the program, so I'm happy. Mostly. I don't actually like liferea all that much. The interface is somehow clunky in ways I can't put my finger on. The one I can is where it fails to act like another application which doing something weird - if, like pan's error window, liferea would put a close window button next to the mark-items-read button, I'd stop accidentally refreshing everything.

Then yesterday I hear that Google (they don't need the miniscule PageRank boost I can offer on a link) has released an upgrade to their feed reader service. The interface is pretty slick. It lets me delete the 2000 odd feeds I had accidentally imported into the old interface in under 10 tries (the previous interface would have required me to delete each one; the new one just kept failing in the middle of each delete all). It's about as nice as the way I read my mail. And then I find the important part: if I put subscriptions in folders, then I can select to read an entire folder. It then interleaves the posts by date. Yay! Planet behavior! This is on top of all the other odd benefits, such as practically infinite history for as long as anyone has subscribed to the same feed.

Today I try to load it. I can't seem to get past the loading animation. Since it's just an animated GIF I have no way to tell if it's making progress. I'm sad. I try another browser. No go. I'm sure they'll fix it later. Mmm, proprietary serviceware indeed.

Sometimes there are just a few too many bikesheds out there just begging to be painted. While I've managed to not quite join the discussions themselves, I just had to share my viewpoint somewhere...

pygame vs ctypes

There have been a couple threads on what to name what may or may not become the newpygame. Most recently a cutesy name pistol was suggested. All because of some hopes to avoid confusion over the original pygame name.

I think the worry about confusion is a red herring, so long as things are properly namespaced. The ctypes implementation is split into two levels: a SDL wrapper, and a pygame compatibility layer. The first comes under the SDL namespace, with entries like SDL.image, SDL.mixer, etc. The pygame compatibility layer should just be another one of these: SDL.pygame.

Then existing code can switch from import pygame to from SDL import pygame if it wants to leverage this layer. No crazy names that nobody understands. No confusion over which is being used. We're done.

TurboGears vs Django

There's a lot of kerfuffle on various python blogs about what Guido van Rossum did or did not pronounce about either TurboGears or Django being the official BDFL choice, and so forth. I have yet to see a single link to text that came from Guido himself. Please stop mountainizing this rumor until there is an email or blog post or video from Guido that the rest of us can read or watch.

As for which I prefer? I'm completely a roll-my-own type, as I only do websites like this one.

I've just released Cankiri 0.2. It's not all that different from 0.1, as most of the changes are to improve the user experience by decreasing unnecessary clicks.

I've made the assumption that when you start Cankiri you want to record, so instead of waiting for you to click the icon, it will automatically bring up the save dialog. I've also added fallbacks so that if you don't have a place for the notification area icon, it will create a little window for just its icon. This isn't pretty, but it ensures that if you can start a recording you can also stop it. The actual recording is then the same. If it worked for you in 0.1, it should still work; if it didn't work, I doubt it will suddenly start working. Let me know if any of these assumptions are a problem for you. I think next release I'll add command line arguments so defaults can be changed; if so I'll probably allow deferring the save dialog.

I have not been able to address a problem in the area selector. Both my friend Pete, and bloodsk are unable to resize it. Pete uses KDE with xinerama on FreeBSD where not only can he not resize the area selector, it shades the part inside. But when he tried Metacity without xinerama it was still immobile. Bloodsk reports being on Ubuntu Dapper, and hopefully will follow up here with what window manager or unusual X settings he might be using.

As if I didn't have enough other projects on my hands, I've just put enough finishing touches on Cankiri to release it into the wild licensed under the GPLv2. It came about after looking at Istanbul around version 1.2 and being disgusted with the limited features and overengineering in the code. Come on, you don't need two directories and at least five files for this functionality. Really. Since then both Istanbul and I have added screen area selectors and audio recording. I've got all my code in one 400 line file; Istanbul now spans many more files. Where this really shows: ls -l.

It's amazing how concise you can be with python once you know what you're doing. I hope you find Cankiri easy to use, as a single-file distribution leaves no room for documentation. Let me know what you think.

(All the jabs at Istanbul's code aside, I'm very grateful for the GStreamer plumbing I've been able to take and reapply. Since I've yet to really learn GStreamer, this has been critical for me.)

Just dropping a quick note here because Independence Week (I took a few extra vacation days after the 4th) gave me time to start more projects than I normally do. And now I'm really busy from them. So in that vein, three quick notes.

The first is a really cool patch to GTK+ which is the beginnings Inplace-Tooltips — the next generation version of my TreeViewHints hack. I have an Ogg Theora screencast of what it can already do in Ex Falso (with TreeViewHints disabled), and am hoping to make a MNG in case it's smaller. I think the hard parts of this battle are making sure the behavior is perfect, that it doesn't leak references, and getting the decision makers to recognize its importance.

The second is a request to those holding copyright interests in GPL-licensed GStreamer-using projects Fluendo has asked to consider relicensing under terms compatible with plugins with which they wouldn't otherwise be compatible. Don't. You chose to license your code under the GPL for a reason. You are being asked to weaken your support for Free Software for commercial reasons, and being offered only the opportunity to be distributed with a particular Linux distribution. You are being asked to trade your belief in Freedom for an imaginary increase in your user base. Freedom is worth much more than that. Stand up for your beliefs and don't worry about people using encumbered formats in distributions that can't handle this a better way.

The last is I've started learning Cairo and have put together the beginnings of a Go client (so far it's just the board display). Cairo is a real treat to work with, but there's not any good introductory documentation. I received a great explanation of the basics just last night from Carl Worth and I hope to put it and the rest of what I've learned into a series of blog posts, and later perhaps a three-in-one tutorial of PyGTK gobjects and cairo.

The last couple days I went through one of the more painful debugging experiences of my life. The symptom was that one of our OggFlac unit tests in Mutagen was failing. But not on x86; that would be too easy. Instead it was only failing on AMD64 systems, which is reminiscent of a bug I covered before. Thankfully Pete was able to provide direct network access to an AMD64 machine for me to test on; otherwise we'd undoubtedly still be fighting this bug. As it was, we spent at least four hours each on Thursday and Friday.

So knowing that this was a painful bug, how do we debug such symptoms?

Tighten our tests

First things first, try to isolate the problem as much as possible. The failing test did a small sequence of things, starting from a nearly blank file, adding some stuff, deleting it, adding other stuff, and then failing the test on line 39. So pare it down. As it turns out all we need are the three lines 37–39: add, save, test.

Examine the code in question

Now that we know which methods reveal the bug, we can look into the code behind them. This can help find many kinds of errors. I think it helped us tighten our code a bit. But after many hours of staring at it and trying various things it was still no help.

Compare to passing code

We had a passing version of the same test on x86. Was the problem in our code, or in the crosscheck we were performing? Taking the output produced on x86 and testing it with flac on x64 (and vice versa) showed it was a problem in our code, and not the tool. Drat. But hey, the file is the same up until that last step. That confirms everything else so far is good.

Since looking at the file gave us no good clues, other than the broken file in question looked like garbage, it was time to delve deeper. I coded up a tracing wrapper, which injects tracing primitives into python code. I invoked this from the OggFLAC tests, and traced ogg, oggvorbis, and oggflac modules under mutagen.

Note: the linked version is the final version. Earlier versions lacked tracefile, function return values, and indentation.

Confuse yourself over and over

The only differences for the longest time, after filtering out inconsequential differences, were at the failure spots themselves. The passing code would continue; the failing code would report an exception. I would add more tracing. The same. What else can I trace? There has to be a difference somewhere. So I added tracefile.

The dreaded heisenbug

And the size of the tracelog exploded. I spent time coping with that when I should have been realizing that there is no longer any difference. The good and bad trace were too different (because of our fallback code for the AMD64 mmap problems in python) so make them more similar. Somehow the failing code started working. What did I do? I added tracefile. Huh? Great...now I can't even observe my bug without it morphing.

In concurrent programming, or code with extreme speed needs, heisenbugs can be a real problem. Fortunately Mutagen is neither.

The final push

The thing about the final push is you never know when it will hit. When you first start debugging you're sure it's right around the corner. Then the more you work at it without solving it, the more convinced you become that something insane is wrong with your tools. So here's a reconstructed stream of consciousness from when I first saw the heisenbug.

Take all tracing out, but leave in tracefile. Okay, it's still working. Take out tracefile. Okay, it fails. Hey, it fails on x86 too. Hallelujah! Wait. Huh? Pare down tracefile. Use it and take out #read. It fails. Put #read back and take out #write. It works. Leave in #read but simplify it to just recurse the call. It fails. What's left?

Eureka

I fiddle with it and find the only piece I need is self.tell() before calling self.read. So I mention my bewilderment to Joe and he finds a gem: this bug has been reported to Python before. And denied. Apparently rightfully so. Six month old tested code is to blame. I fix it up by using explicit seek calls, and everything passes.

Lessons learned

There's always another dark corner of your toolchain left to be learned, often the hard way. In our case it was the fact that interleaved read and write operations on a file pointer, without explicitly resetting the stream position, are undefined. They work most of the time on the implementations we run on, but they fail sometimes. I feel we did an okay job of challenging and proving our assumptions right or wrong, but it was really hard to make the leap that let us trust the new barely tested code that was revealing the failure, and stop trusting the old tested code that was causing it. I'd like to do better about that in the future.

I need a trick. I want to build a plugin system in C# that supports internationalization. I want to be able to create various interfaces (called IPlugin here for simplicity) which classes in the plugins can implement so the framework knows how to load them. The framework must also able to get metadata from them in order to describe them to the user. What are my options?

Here's what I've come up with so far. I can create two interfaces, IPluginFactory and IPlugin when I only really want one. This is potentially useful anyway for a number of reasons so I might go with it. It would look something like this:

Here IPluginFactory is implemented by a thin wrapper class which gives me data about the class implementing IPlugin which it knows how to manage. In the Nameget method I can easly load and return Strings.MyString to retrieve a localized value. And all the meat can go into the IPlugin class. Thus it's cheap to instantiate all IPluginFactory classes, and put off instantiating a given IPlugin until I need it.

But isn't this sort of static metadata and creation needs exactly what attributes and static factory functions are for? Rather than having to implement two classes for every IPlugin, it would be nicer to just attribute one class.

Between the two it means I can't easily enforce (compile-time) an implementation of the static methods, and I can't put localized strings into Attributes at all, as they all require runtime resource lookup. Because I can't put static properties in an interface, I also can't simply work around the second by putting the strings into static properties, although since unattributed classes aren't errors either, it's less of a compile time issue.

What ideas am I missing? I can use, but don't particularly like, the two interface approach. An inventive developer could probably implement both interfaces on one class, although it would only rarely be worth it, and would miss the point of having a thin wrapper class. I really like the look of the attributed single interface sample code. How close can I get?

I've been playing with C# lately with an eye towards using it at work instead of the nasty legacy C++ codebase. And one of the pieces I'm still struggling with is internationalization. On unix we use the infamous gettext _() macro, which works in the source as a marker that its string needs to be translated, and at runtime as a function which looks up the translated string. In MFC we use numeric resources and utility functions like CString::LoadString(IDS_MYSTRING) or constructors like CString(MAKEINTRESOURCE(IDS_MYSTRING)). In C#, at least as presented in Visual Studio 2005, the idea is similar to the latter.

So I tried to use it. You create a resource file, add some strings to it, and then it's time to use them. You can try something like the following (which I'm not sure is correct):

But there's one major quirk here. Unlike the unix or C++ methods, this is using a raw literal which is not a desired value. If you mess up the value passed to GetString(), you'll catch it at runtime. Clearly that's not the right way. You could make constants for these lookup strings, but that doubles the work. Turns out there's a better way.

Whatever you named your .resx resource file, there's an automatic type-safe class generated behind the scenes for you with static property accessors. So if I called it Strings.resx and had a string MyString, there's an accessor already set up so all I need is:

string mystring = Strings.MyString;

I'm new to C# in .NET 2.0, and the documentation isn't quite clear on this point, but I get the feeling this support is new to VS2005. It's definitely a nice experience for resource-style internationalization.