Erik Wilde on Services and APIs

Friday, November 23, 2007

Atom Browsing

feeds are becoming increasingly important as information resources on the web. while Atom is gaining momentum and will hopefully replace RSS eventually, today there is a mix of RSS and Atom feeds on the web. some site even publish various versions of their feeds, with different numbers of items in them or one with summaries and one with complete entries. this assumes that readers are not smart enough to to this kind of customization, which is not smart, because instead of being able to just configure your reader, you have to unsubscribe from one variant of the feed and subscribe to another. if you can find it...

today's browsers mostly support feeds, and there basically are two possible interactions with a feed:

look at the feed in its current version.

subscribe to the feed and add it to a feed reader.

feed subscription is something i will look at in another post, here i am simply interested in how you can look at a feed's contents. firefox, safari, and IE7 all support feed rendering, with safari probably providing the best interface of these three. opera does not render feeds at all, it simply displays the source code in the same way it display XML documents. not very helpful.

IE7 seems to have issues with feeds using prefix-based namespaces, which is a pretty good indication that the implementation has been hacked together by some scripting kid with string-based tools. that's bad. of course, processing feeds takes a bit of care because there are so many ill-formed feeds out there (RSS feeds are typically worse than Atom, but there are also many ill-formed Atom feeds out there). so i think it makes sense to process feeds in a forgiving way, but if that means that you are having troubles processing valid feeds, then something is very wrong.

for any kind of feed processing, the following scheme should be employed:

read the feed and test it for validity using XML tools. if it is valid, go to step 3.

for invalid feeds, try to salvage some of the feed contents. this can involve quite a bit of guessing and heuristics, thanks to the good old days of RSS. the result of this step must be a valid feed.

convert the feed to Atom.

with this process, it should be rather simple to avoid the kind of thing that is happening in IE7, which seems to rely on string-based processing.

i am pretty sure that there are packages out there which implement this three-step process and read any kind of feed and come up with a good Atom feed as a result. any opinions on which package does the best job of cleaning up messy feeds?