June 8, 2008

Tweaking an RSS Feed in Python

I’ve been teaching myself a bit of Python by the just-in-time learning method: start programming, wait for the interpreter to complain, and go check the reference manual; keep the API docs on your hard disk and sift through them when you need a probably-existing function. Recently, I wanted to write a very simple script to manipulate some XML (see below) and I was surprised (though it has been noted before) at the relatively confused state of the art in Python and XML.

First of all, the Python XML API documentation is more or less “go read the W3C standards.” Which is fine, but… make the easy stuff easy, people.

Secondly, the supposedly-standard PyXML library has been deprecated in some form or fashion such that some of the examples from the tutorial I was working with have stopped working (in particular, the xml.dom.ext module has gone somewhere. Where, I do not know).

So, in the interest of producing more and better code samples for future lazy programmers, here’s how I managed to solve my little problem.

The Problem: Twitter’s RSS feeds don’t provide clickable links

The Solution: A script suitable for use as a “conversion filter” in Liferea (and maybe other feed readers too, who knows?). The script should:

Read and parse an RSS/Atom feed from the standard input.

Grab the text from the feed items and “linkify” them

Print the modified feed on the standard output.

Easy, right? Well, yeah. The only tricky bit was using the right namespace references for the Atom feed, but again that’s only because I refuse to read and comprehend the W3C specs for something so insignificant. I ended up using the lxml library, because it worked. (The script would be about 50% shorter if I hadn’t added a command-line option --strip-user to strip the username from the beginning of items in a single-user feed and a third shorter than that if it only handled RSS or Atom and not both.)

Share this:

Like this:

Related

2 Comments

Interesting post.

Please note that you should not specify default values for function arguments that are objects – e.g. the “ns” argument being assigned an empty dictionary object. The reason is that the default value is evaluated only once rather than on every call.

About

A place to document tips and tricks for the various Linux, LaTeX, OCaml, and other general computer-related topics that I encounter in my daily life and a repository for supposedly amusing observations, hyperlinks, recipes, reviews, travelogues, and political opinions written when I have a spare moment (or when I don't have a spare moment, but I don't feel like working)