Ben Laurie blathering

(It recently occurred to me that I rarely talk about what I do best, which is write code. So, this is an experimental kind of post wherein I write in far too much detail about some piece of coding. I’d be interested to know whether people want to read this kind of stuff)

Today I got annoyed enough about this to decide to do something about it. Since, of course, I am using open source tools, I can fix them. Normally the way I would proceed with this would be to compare the version I am currently running against the original source, using diff, of course, then upgrade to the latest version and apply my changes to it.

This is always a slightly painful process, so over the years I have played with a couple of ways to make it less painful. Well, usually. Sometimes you have to do a make distclean or some other variant to avoid getting generated files in the diff.

One early experiment was to use CVS vendor branches. I’ve never really got on well with this, for various reasons. Firstly, the standard advice for merging vendor changes into the main tree is to run

$ cvs checkout -jFSF:yesterday -jFSF wdiff

pretty obviously this only works if you don’t import more than once a day, though you can fix this using tags, but my main problem is that I’ve always found this command completely meaningless to me. Which is perhaps why I suffered from my other problem with this approach, which was that over time it appeared to gradually drift away from both the vendor source and my patches, in apparently random ways.

More recently, I’ve tended to just grab the tarball, unpack it, rename it (typically to <package>-ben) unpack it a second time and make my changes to the -ben version. Then when I’m done I can do

$ make clean
$ cd ..
$ diff -urN <package> <package-ben>

and presto, a patch. One snag with this scheme has always been that you then end up with one monolithic patch for everything. This causes two issues; firstly, when I want to apply the patch to a new version, its hard to see which changes go together, especially when they span multiple files, and so can get tricky to make sure you resolve conflicts correctly. Secondly, if I want to contribute the patches back upstream, which I often do, developers usually want patches separated by functionality, so they can review them more easily.

It turns out that this is hardly a new problem, and a friend of mine recently turned me on to quilt. quilt is pretty cool. It automates the production of diffs. It has the idea of a “stack” of patches, so I can divide stuff up according to functionality, and have a patch for each, which I can apply and unapply at the drop of a hat. The patches themselves just live as, well, patchfiles, so I can send them in emails and stuff without any problems. So, for my inaugural use of quilt, I decided to attempt my rss2email upgrade using it.

Unfortunately, despite my claim above to be somewhat organised about patching software, it turns out that I didn’t actually save the original version of rss2email that I started from, and I can’t find it on the web, either. I blame rss2email‘s somewhat eccentric distribution method, which doesn’t start with a tarball, but instead just hands you links to individual files. I seem to remember I had to seek some of them out first time around, too. In the end I decided to just start from scratch. I know what I want, so I just need to keep hacking until I get it.

Step one is to add the convenience script I use to run rss2email, r2e. First off, tell quilt we’re making a new patch

$ quilt new add_r2e.diff

now add the new file to the patch

$ quilt add r2e

once that’s done, I can create r2e (apparently I have to do the add before the actual creation), and get quilt to update the patch accordingly

an interesting thing to note here is that as I went along I wanted to make further changes to config.py even though I now had other patches stacked on top of this one. A cute feature of quilt is that you can still do that, so long as later patches don’t make conflicting changes, by making the edit, then doing

$ quilt refresh my_config.diff

If later patches do conflict, then you can either pop patches until you get back to this one, make your change, refresh, then push, resolving conflicts as you go, or create a new patch at the top of the stack that makes the change. Which I’d do would depend on whether the change fits logically in the existing patch or not. The patch isn’t very fascinating, but for completeness, here it is

Next, I wanted to be able to make changes to the config for debugging, without having to keep different versions of the config file for “production” and debug versions. So, I decided to add a second “local config” file, called, amazingly, local_config.py.

Slightly cheating here, I am anticipating my next change, which is to add more verbosity, so I can see what’s going on. Here’s the output from quilt when asked to show this patch a bit later in the process

I try to avoid ever having to rely on my memory (though I do still find I sometimes have to think hard to remember the name of a piece of software I only occasionally use, so I can find it on my disk again – any suggestions?), so the next thing I do is add a Makefile for testing

(quilt maintains the patches/ directory for you). Finally I’m ready to do some real work! I want to know what would be sent in email, and what the parsed RSS looks like. I think you have got the hang of creating patches by now, so I’ll just show you the patch itself…

(At this point, I get less Popper and more Feyerabend, as I am now writing this post as I work on the code, instead of after the fact)

I can’t actually remember the changes I made to the original rss2email so, as I said, I am results-oriented here. My first complaint is that the author no longer appears in the output, and if I do a make, I can see that this is still true, even using the updated version, as this sample shows

Our RSS feeds are not broken, nor are they the only ones affected, not by a long shot. According to various reports, authorities in China are attempting to block *all* RSS feeds to keep out information that may be critical of the nation’s government. Link to item on Ars Technica.

Note that this isn’t quite exactly what was output – I removed FeedBurner’s snoopy images. More on that later. But as you can see, no mention of an author (though the output is quite a bit prettier than I’m used to). Looking at the parsed RSS feed, though, I see

At this point I should note that the version of rss2email I’ve been running up to now did not, as far as I can tell, in any way process this field. Also, I’ve exchanged email with BoingBoing and they say they haven’t changed anything. I conclude, therefore, that FeedBurner has, as people suspect, probably changed the format (from including the author version in the post content to only having it in the markup). However, the new version does look for author information, which it tries to include as the “From” field in the email. Here’s what it does

Yay, we have an author! I even like the idea of it being in the from field. At this point I could probably stop but another thing has been irritating me, and that’s FeedBurner’s web bugs at the end of each post. So, I’m going to remove them. They look like this

Its always a bit tricky removing something like this – you want to be sure you don’t accidentally remove some other similar-looking stuff. Regular expressions are the answer, of course. They are, however, a bastard to debug since a mistake anywhere causes the whole thing to not match. My technique is to start at the left-hand end and extend the expression a piece at a time as I get it working. The one hint I have for python is that using an “r” like this, r'\w+', preserves backslashes. Anyway, here’s the patch…

If I’d done this in feedparser.py then it would also work for text-mode emails, probably. I could probably be persuaded to put the patch there instead.

Anyway, now I’m done, so some retroactive tweakery of the makefile, to include an install target, and also to make sure that the patches are available to anyone reading this, yielding a new version of the makefile patch…

This entry was posted
on Sunday, October 7th, 2007 at 17:31 and is filed under Open Source, Programming.
You can follow any responses to this entry through the RSS 2.0 feed.
Both comments and pings are currently closed.

They are an extension to mercurial that allow you to maintain a patch set just like quilt does. But, once a patch is applied normal mercurial commands like ‘hg log’ and ‘hg diff’ work as if the patches were normal revisions of the repo.

And it’s fast too. And you can version control your patches. And you get the mercurial merge support, etc.