Ramblings from the creator of HomeSite, TopStyle, FeedDemon and Glassboard Android.

Friday, October 05, 2007

Earlier this week I wrote about sanitizing CSS, and I've been thinking about it a bit more. Like many RSS aggregators, for security and presentation reasons the current version of FeedDemon strips all inline styles before displaying a feed, and I thought this was the best approach. But after seeing the Wikipedia feed that Sam Ruby pointed me to, I'm rethinking that.

Just so you know what I'm talking about, take a look at the screenshots linked below. The first shows the Wikipedia feed in FeedDemon with all inline styles removed, while the second shows the same feed with styles intact:

As you can see, the feed is far more useful with the styles intact. So rather than blindly strip all inline styles, the next version of FeedDemon will use a "whitelist" of allowed CSS properties and values. FeedDemon's whitelist will be based on the same rules that Bloglines uses, as outlined in the Sanitization Rules wiki. However, I may make FeedDemon's whitelist even stricter, since I'm not convinced that it's wise to enable things like background images and CSS cursors in feed content.

At this point, you might be wondering why RSS aggregators need to bother whitelisting inline styles - why not just leave all the inline styles intact? Beyond the security issues, one problem is that some people will use things like excessively large font sizes to make their posts stand out. Other people will deliberately insert "prank" CSS, like a page full of offensive images designed to ruin the reading experience.

These annoyances aren't really a problem when the post is viewed by itself or within its feed - after all, if you subscribe to a feed that annoys you, you'll simply unsubscribe from it. But it's a different story when it's combined with posts from other feeds in a "river of news" view, or in a search feed from Technorati or Google. The latter issue is the one that concerns me the most, since theoretically someone could ruin a ton of RSS search feeds by littering their blog with popular keywords, and then injecting some nasty CSS into the blog's feed.

Luckily for me, I've already got a ton of CSS parsing code which I wrote for TopStyle, so it won't be a big deal to add inline style whitelisting to FeedDemon. But if you're an aggregator developer who'd also like to whitelist inline styles and you don't have a background in CSS, you might appreciate a few tips I learned the hard way:

Assuming valid CSS is an invalid assumption. Trust me: just like HTML and RSS, plenty of people use completely invalid CSS. Things like unclosed quotes and declarations without colons can trip up your parser if you assume that inline styles will be correctly written.

Quotes can be escaped. Although rarely used in practice, characters inside CSS values may be escaped with a backslash. This is most commonly used in the box model hack, which relies on escaped quotes to trick outdated browsers into ignoring specific styles (ex: <p style="width:400px;voice-family: "\"}\"";voice-family:inherit;width:300px;">). In other words, your parser can't assume that quotes always mark the start or end of a value.

Quotes are optional, and single quotes are allowed. Although XHTML requires attribute values to be inside quotes, browsers don't enforce this requirement. In addition, it's fine to use single quotes instead of double quotes around values. So make sure your parser handles all three variants (ex: <p style="color:red">, <p style='color:red'> and <p style=color:red>).

Pixels are the default length unit. One of the things I'm doing in FeedDemon is stripping excessively large font sizes (ex: <p style="font-size: 800px">), which requires enforcing a max size based on the length unit. If you plan to do the same thing, keep in mind that when the length unit is missing, browsers may assume that pixels (px) were intended. So <p style="font-size: 12"> is the same thing as <p style="font-size: 12px">.

Font sizes can get you into trouble. If you "flow" multiple posts in the same newspaper page (like FeedDemon), you have to be careful that a font size declared in an unclosed tag in one post doesn't affect subsequent posts. The problem gets worse with relative font sizes (ex: <p style="font-size: smaller">), since improperly nested relative font sizes could result in a tiny single-pixel font size (or a huge font size when "larger" is used).

Floats can also get you into trouble. If your aggregator uses a multi-column newspaper view, be careful that floated elements don't overlap posts in adjacent columns (ex: <img style="float:right" src="http://nick.typepad.com/images/basil.gif" />). And you might want to consider only permitting images to be floated, to avoid having floating DIVs, etc., causing problems.

Strip class and id attributes. If your newspaper view relies on classes and/or ids to identify items in the page, I recommend removing class and id attributes from the actual posts - otherwise a post could use the same class names that you use in your newspaper, potentially creating all kinds of havoc.

Remove top-level tags. Although they shouldn't be there, I've seen some feeds that contain top-level tags such as BODY and HTML in their posts. Imagine the impact on your river of news if some prankish feed author inserts a styled BODY tag into their feed.

If your aggregator embeds IE, get out of the local zone. This applies more to script than it does to CSS, but it bears repeating: if you're embedding the WebBrowser object, don't allow locally displayed content to operate in the local zone. If you're not sure what I'm talking about, refer to my earlier post on this topic.

Wednesday, October 03, 2007

"There has been a huge push in recent years to move away from the old habits of early HTML and to leverage CSS for presentation - the fact that it doesn't work in feed readers is a major pain for people trying to do the right thing. It's good that we identified a security threat and dealt with it quickly - but it's not acceptable to stop there. We need to work to get the functionality that we used to have back without reintroducing the security risks. It's not simple, but it is important."

That's a valid point, and I'm glad Adrian raises it. As the author of both an HTML/CSS editor (TopStyle) and an RSS aggregator (FeedDemon), this is something that I've wrestled with quite a bit. On the one hand, I've promoted the power of CSS by creating a web authoring tool tailored for building CSS-based sites, yet on the other hand I'm taking that power away by creating an RSS reader that removes CSS from feeds. What gives?

It all started back in 2003, when Mark Pilgrim's "platypus prank" illustrated how feeds containing CSS could be a problem. Most RSS aggregator developers (myself included) tackled this problem by completely removing all styles from feed content. Since then, I've experimented with stripping only "unsafe" CSS from feeds, and despite Adrian's claim that doing so requires a lot of work, it's actually quite easy to do (especially for me, since I already have code in TopStyle that could do this, and it would be painless to plunk it into FeedDemon).

The real problem isn't security, though: it's presentation (ironically). Leaving styles intact makes sense if you're reading one post at a time, but it makes less sense in a river of news where posts from multiple feeds flow down the page. The purpose of a river of news isn't to retain the presentation of any single post, but instead to provide a common presentation for all posts, making it easy to pick out the ones that interest you. If each post had its own style, you could end up with river of news that looks like a ransom note. Given how some bloggers and MSM outlets will do anything to grab your attention, I'll wager that outcome is far from unlikely.

Another problem - and this is one that bothers me when I don the TopStyle hat - is that if I followed Bloglines' approach and permitted a whitelist of inline styles, then feed authors couldn't use classes defined in an external style sheet. In other words, they'd be forced to resort to using style attributes on individual HTML tags, which kills the maintenance benefit of using CSS in the first place. To me, the best thing about CSS is that it enables storing a site's presentation in a single file - just change the external style sheet, and that change will be reflected site-wide. This benefit is lost when you use inline styles.

So, perhaps the real question isn't whether RSS aggregators should support inline styles, but whether they should also support external styles as well? Despite my love for CSS, my vote would be no - not because it would be hard to do, but because of the potential impact on the feed-reading experience.

And if only inline styles are supported, which ones make the cut? Personally, I'd want a smaller whitelist than the one Bloglines supports, and I'd also want to make sure that properties such as "float" don't impact subsequent posts in a river of news view.

Wednesday, September 12, 2007

I had the good fortune to lose my outdated cell phone recently, which gave me the perfect excuse to splurge on a new iPhone (I waited until Sept 5 to order it, of course).

I have to admit that I love my iPhone, despite not being able to "officially" develop native apps for it. It's the mobile device I've been waiting for. After a few days of using the iPhone's touch screen, my desktop computer's mouse suddenly seems antiquated.

Not long after I got the iPhone, I decided to give our free iPhone reader a try - and it's now by far my most-used iPhone app. Being able to read my feeds when I'm on the go is great, and I really appreciate how bandwidth-friendly our iPhone reader is. The best thing, of course, is that everything is synchronized with our RSS platform, which means that stuff I read on my iPhone is automatically marked read in FeedDemon.

Thursday, August 16, 2007

I've been writing about attention for quite a while now, ever since Steve Gillmor introduced me to the concept at Gnomedex 2004. Since then I've experimented with various ways to improve RSS aggregation by examining what you're paying attention to, but I've rarely been satisfied with what I've come up with.

The basic problem with RSS aggregators is that once you subscribe to enough feeds, you've got too much information to keep up with. Sure, on a slow day you can read everything you're subscribed to, but when you're busy, you might just want to read the stuff that's important to you.

This was the impetus for the popular topics feature in FeedDemon 2.5, which shows the most talked about items in your subscribed feeds. I know I rely on this "personal memetracker" feature a lot - when I'm nose down in the code and don't have time to read my feeds, I just mark everything as read and then view FeedDemon's popular topics to see the most commonly linked articles. Overall I've found this more effective than the various online memetrackers because it's personalized with only the feeds that I'm subscribed to.

Lately I've noticed that my popular topics have been bringing me a ton of articles that are of interest to me, articles I might've missed if FeedDemon didn't have this feature. And I've also noticed that the single biggest reason I'm getting so many interesting articles is because I'm subscribed to a number of link blogs.

If you're not familiar with link blogs, they're basically collections of articles that someone finds interesting. For example, FeedDemon 2.5 added the ability to share your favorite links as an RSS feed, and I have a link blog feed of my own. FeedDemon isn't the only aggregator that offers this feature, either - NewsGator Online has had it for quite some time, and it's also available in Google Reader.

Whenever I read something interesting, I copy it to my link blog. So my link blog is like my attention stream - it contains the stuff that I'm paying attention to.

And now that I'm subscribed to several link blogs, I can see what others are paying attention to. When an article appears in more than one of link blogs that I'm subscribed to, it shows up in FeedDemon's popular topics. This consistently brings me new articles that I never would've found by myself.

That to me is the holy grail of "attention." One of the main goals of the attention concept is to enable you to filter out the noise and just see the stuff that's important to you, and I'm finding that FeedDemon's popular topics combined with subscribing to link blogs is consistently doing just that.

I think there's a lot more that can be done here - not just in FeedDemon, but in other aggregators as well. If you use an RSS aggregator, I believe you could really benefit from seeing the things that the people you pay attention to are paying the most attention to.

Friday, July 20, 2007

"I see too many people I know getting caught up in the breathless hype and forgetting to think about whether the latest shiny new thing really matters in the grand scheme of things. Sooner or later the treadmill is going to tire you out..."

When I hear someone complaining about all the feeds competing for their attention, I have to wonder why they don't just unsubscribe from most of them. Are their aggregators not helping them find the feeds they're paying the least attention to so they can figure out which ones to unsubscribe from? I regularly weed out the feeds that I don't spend any time with, so catching up with my unread posts every morning doesn't turn into an all-day affair.

In my case, part of my "feed weeding" involves getting rid of a bunch of single-topic feeds, then subscribing to one feed that points out the interesting articles in those feeds. Scoble's link blog, for example, saves me from subscribing to a ton of tech-related feeds. In this situation, Scoble assumes the position of an editor (and I do the same thing with my link blog).

In fact, I could easily get rid of 90% of my feeds if I could find better editors on topics other than technology. Syndication lends itself to the rise of sites that point out the interesting stuff to us, so I'll wager we'll see more editors and fewer over-subscribed feed readers over the next few years.

Thursday, March 29, 2007

I've seen too many articles about RSS that define it as "Really Simple Syndication" and "Rich Site Summary" and "RDF Site Summary." As best I can tell, the authors of these articles feel the need to define RSS multiple ways to avoid getting flamed by advocates of a particular feed format.

Look, I'm aware of the tortured history of feed formats, but how on earth does it help any of us if the press can't even use a single definition of RSS? That's insane.

Regardless of which RSS format is your favorite, I'm pretty sure you'll concur that RSS 2.0 is the most widely used version. RSS 2.0 is defined as "Really Simple Syndication," and that's a much friendlier definition for non geeks to deal with.

So let's get rid of every other definition of RSS. RSS = Really Simple Syndication, Q.E.D.