Ramblings from the creator of HomeSite, TopStyle, FeedDemon and Glassboard Android.

Tuesday, October 30, 2007

Here's an example: Steve Rubel's "The Web 2.0 World is Skunk Drunk on Its Own Kool-Aid" rant caught my eye yesterday because of its great title. After reading Rubel's post, I added it to my link blog, where it was spotted by Steven Hodson. As Hodson writes, he unsubscribed from Rubel's feed a while ago, but he just resubscribed based on the strength of that one post - and I'll wager that the post's title is why it got his attention.

As people subscribe to more feeds, the more they stop reading every unread item and instead just skim the titles looking for something that interests them. If you use boring titles for your posts, skimmers like myself are likely to skip right over them.

In addition, once people get used to reading feeds, they start subscribing to link blogs and search feeds which aggregate content from all over the web. People who aren't subscribed to your feed often find you through these aggregate feeds, and it's the strength of your titles that leads them to read what you have to say.

Now, I'm not about to recommend using sensationalist, "National Enquirer"-like titles - that would just pollute your name/brand, leading people to unsubscribe from your feed. But descriptive, catchy titles get the attention of readers who might otherwise never see your words of wisdom.

So if you're going to take the time to write a blog post, make sure to also take the time to give it a good title. Yeah, I know that sounds painfully obvious, but a quick glance of your unread items should provide plenty of examples of interesting posts that go ignored because of lousy titles.

Monday, October 29, 2007

One of the features I've been planning to add to the upcoming FeedDemon 2.6 is an inline search toolbar similar to the one in Firefox, and I'm pleased to announce that I've finally coded it. But it was harder that I anticipated, primarily due to a weird IE bug (more about that later).

In the current version of FeedDemon, hitting Ctrl+F in the browser displays IE's "Find" dialog, shown below:

While this does the job, it's not nearly as useful as Firefox's inline search, which enables highlighting every instance of a keyword on the current page. So I decided to intercept Ctrl+F and display my own inline search toolbar (click for full screenshot):

After a few hours of coding, I had it working great - or so I thought. Then I did an inline search for the word "the" on CNN's web site, and FeedDemon came crashing down. An inline search on Techmeme had the same result. Boom!

Note: The rest of this post is intended for frustrated developers Googling for help on the problem I ran into.

I traced the crash to a call to the Select method of MSHTML's IHTMLTxtRange interface. Now, you might think I'd be happy to figure out where the bug was, but I wasn't - I'm never happy to discover that a bug resides in code I didn't write, since that means hours of trial-and-error (literally) trying to resolve it.

After numerous false starts, I finally discovered that the crash occurred when IHTMLTxtRange.FindText locates a match in text that isn't visible (inside a hidden DIV, for example). Using the Select method on a visible text range works fine, but using it on a range that's hidden results in the less-than-helpful exception "could not complete the operation due to error 800a025e."

Luckily, once I realized the problem, I was able to work around it with some hackish code. The first step was to catch the exception, and then figure out the next visible element after the hidden text range so I could perform a "FindText" on it (in other words, skip over the hidden match). In Delphi code, it looks something like this:

// ...code prior to this calls IHTMLTxtRange.FindText to locate the keyword...
// SafeSelectRange wraps IHTMLTxtRange.Select and returns False when it raises an exceptionif SafeSelectRange(txtr) then
Result := True
else
begin
// get the parent element of this text range
if (txtr.parentElement = nil) then
exit;
// find the element after this one
nNextSrcIdx := txtr.parentElement.sourceIndex + 1;
if nNextSrcIdx > iDoc2.all.length then
exit;
elNext := iDoc2.all.item(nNextSrcIdx, null) as IHTMLElement;
if elNext = nil then
exit;
// move the text range to this element
txtr.moveToElementText(elNext);
// retry the 'FindText' call again (but don't retry forever)
inc(nNumRetries);
if nNumRetries < MAX_RETRIES then
bRetry := True;
end;

I've stripped a lot of the logic from this code to make it more presentable, but hopefully this is enough to help other developers running into the same hair-pulling bug.

Wednesday, October 24, 2007

Dave Winer writes about how he'd like a way to exclude specific items in his RSS feed from appearing on TechMeme, and suggests a TechMeme namespace for RSS as one possibility.

Rather than create a TechMeme-specific namespace, I'd prefer to see the existing noindex meta tag adapted for use on a per-item basis. For example, right now you can add this to your feed to prevent search engines from spidering it:

I use this on the feed for my link blog (since my link blog contains items from other feeds, using noindex helps feed search engines prevent duplication) and Yahoo and Google both honor it.

So how about we adapt this for use on a per-item basis, so that individual items can be excluded without excluding the entire feed? Search engines and sites like TechMeme could simply ignore any items that are flagged with noindex.

Of course, this approach wouldn't prevent only TechMeme from indexing an item, so it doesn't entirely fulfill Dave's request. But if preventing a specific site from indexing an item is something that feed creators want, then perhaps a user-agent attribute is needed (similar to the User-agent line in robots.txt).

Tuesday, October 23, 2007

Now that TopStyle 3.5 is out, a number of customers have asked us where they can download the trial version.

I generally like to wait a few weeks after a product release before putting out a trial version, so right now a trial of TopStyle 3.5 isn't available. But there will be one in a few weeks, once the rollout of TopStyle 3.5 has settled down. I'll post an announcement here when it's available.

Thursday, October 18, 2007

At long last, the final release of TopStyle 3.5 is now available. My thanks go out to everyone who helped during the lengthy beta cycle - I hope you enjoy the new version!

As you can see from the release notes, version 3.5 isn't a minor upgrade - it offers a ton of new features, improvements and fixes. It could easily have been called TopStyle 4.0, but I know customers have been waiting a long time for a new TopStyle, so we've named it v3.5 and made it a free upgrade from v3.0.

To get it, just stop by the TopStyle Registered Download page, then enter the serial number you received if you purchased TopStyle through NewsGator. If you purchased through another online store (such as RegNow), enter the email address you used when you placed your order. Note that there's no need to uninstall the previous 3.x version before upgrading - just install v3.5 directly on top of it.

Top five new features in TopStyle 3.5:

Box Spy - exposes an HTML tag's margins, padding and content box as you mouse over it in the preview

Monday, October 15, 2007

If you're a regular reader of this blog, you probably know that attention is a recurring theme here. I've long been convinced that RSS aggregators can help people overcome information overload by first paying attention to what a user is reading, and then using that information to make better decisions about what might (or might not) be important to that user.

And I've also long been convinced that knowing what everyone is paying attention to, what your friends are paying to, etc., is a great way to uncover new and potentially important trends.

A big roadblock in this process has been the lack of a standardized format for storing attention data. While Attention.XML has been around for quite a while, for various reasons it hasn't really caught on. As a result, many attention-related tools (including FeedDemon) rely on their own proprietary attention formats - which works fine for individual tools, but it doesn't enable customers to easily share their attention data between services.

I looked into using APML as an attention format a while ago, but at the time I didn't like the idea of storing a customer's feeds in an OPML file, and then storing their attention data in a separate APML file. That's easy enough for software to deal with, of course, but it would be a burden for less-geeky end users wishing to share their data. For that reason, I proposed an attention namespace for OPML - basically a way to store attention data within OPML. But that idea also never caught on for various reasons.

Fast forward to a couple of weeks ago, when I was prompted to take another look at APML by NewsGator Inbox's Nick Harris. This led to a wider discussion about APML among NewsGator folks, including NetNewsWire's Brent Simmons. We liked the "Really Simple Attention" approach proposed by the APML Workgroup, so we decided to tinker with APML despite my misgivings about storing subscriptions and attention separately.

The first step will be APML export, which is already underway. Once we've worked out a few details, we'll also support APML import. This means you'll be able to share your attention data not only between our tools, but also with any service that supports APML. Our hope is that by supporting APML, more third parties will be convinced to support it.

In addition, we're working on ideas for supporting APML in NewsGator Online (not just for individual users who want to access their private attention data, but also for the service as a whole – ex: "what's everyone paying attention to?"). The NewsGator Online piece is still in the baby stages so there's nothing solid to announce yet, but we do want to let everyone know that it's in the works.

Last but hopefully not least, as Read/WriteWeb reports, I've joined the APML Workgroup and will offer my assistance and input on defining and implementing the APML specification.

Friday, October 12, 2007

Earlier this week, Nick Harris blogged about the "My Reading Habits" feature he's adding to the next version of NewsGator Inbox (which, btw, is shaping up to be an excellent release).

This simple report gives an overview of the attention you're spending on each of your subscriptions. It's a pretty cool feature, which is why it's also being added to the upcoming FeedDemon 2.6.

As you can see from the screenshot, lately the majority of my attention has been going to the TopStyle Support Forum, followed closely by the active cases in my FogBugz queue. That's the way it usually goes when there's a beta version of one of my applications available - and pretty soon now, my attention report is going to show that I'm paying the most attention to the FeedDemon Support Forum, 'cos the first beta of FeedDemon 2.6 is in the works :)

Friday, October 05, 2007

Earlier this week I wrote about sanitizing CSS, and I've been thinking about it a bit more. Like many RSS aggregators, for security and presentation reasons the current version of FeedDemon strips all inline styles before displaying a feed, and I thought this was the best approach. But after seeing the Wikipedia feed that Sam Ruby pointed me to, I'm rethinking that.

Just so you know what I'm talking about, take a look at the screenshots linked below. The first shows the Wikipedia feed in FeedDemon with all inline styles removed, while the second shows the same feed with styles intact:

As you can see, the feed is far more useful with the styles intact. So rather than blindly strip all inline styles, the next version of FeedDemon will use a "whitelist" of allowed CSS properties and values. FeedDemon's whitelist will be based on the same rules that Bloglines uses, as outlined in the Sanitization Rules wiki. However, I may make FeedDemon's whitelist even stricter, since I'm not convinced that it's wise to enable things like background images and CSS cursors in feed content.

At this point, you might be wondering why RSS aggregators need to bother whitelisting inline styles - why not just leave all the inline styles intact? Beyond the security issues, one problem is that some people will use things like excessively large font sizes to make their posts stand out. Other people will deliberately insert "prank" CSS, like a page full of offensive images designed to ruin the reading experience.

These annoyances aren't really a problem when the post is viewed by itself or within its feed - after all, if you subscribe to a feed that annoys you, you'll simply unsubscribe from it. But it's a different story when it's combined with posts from other feeds in a "river of news" view, or in a search feed from Technorati or Google. The latter issue is the one that concerns me the most, since theoretically someone could ruin a ton of RSS search feeds by littering their blog with popular keywords, and then injecting some nasty CSS into the blog's feed.

Luckily for me, I've already got a ton of CSS parsing code which I wrote for TopStyle, so it won't be a big deal to add inline style whitelisting to FeedDemon. But if you're an aggregator developer who'd also like to whitelist inline styles and you don't have a background in CSS, you might appreciate a few tips I learned the hard way:

Assuming valid CSS is an invalid assumption. Trust me: just like HTML and RSS, plenty of people use completely invalid CSS. Things like unclosed quotes and declarations without colons can trip up your parser if you assume that inline styles will be correctly written.

Quotes can be escaped. Although rarely used in practice, characters inside CSS values may be escaped with a backslash. This is most commonly used in the box model hack, which relies on escaped quotes to trick outdated browsers into ignoring specific styles (ex: <p style="width:400px;voice-family: "\"}\"";voice-family:inherit;width:300px;">). In other words, your parser can't assume that quotes always mark the start or end of a value.

Quotes are optional, and single quotes are allowed. Although XHTML requires attribute values to be inside quotes, browsers don't enforce this requirement. In addition, it's fine to use single quotes instead of double quotes around values. So make sure your parser handles all three variants (ex: <p style="color:red">, <p style='color:red'> and <p style=color:red>).

Pixels are the default length unit. One of the things I'm doing in FeedDemon is stripping excessively large font sizes (ex: <p style="font-size: 800px">), which requires enforcing a max size based on the length unit. If you plan to do the same thing, keep in mind that when the length unit is missing, browsers may assume that pixels (px) were intended. So <p style="font-size: 12"> is the same thing as <p style="font-size: 12px">.

Font sizes can get you into trouble. If you "flow" multiple posts in the same newspaper page (like FeedDemon), you have to be careful that a font size declared in an unclosed tag in one post doesn't affect subsequent posts. The problem gets worse with relative font sizes (ex: <p style="font-size: smaller">), since improperly nested relative font sizes could result in a tiny single-pixel font size (or a huge font size when "larger" is used).

Floats can also get you into trouble. If your aggregator uses a multi-column newspaper view, be careful that floated elements don't overlap posts in adjacent columns (ex: <img style="float:right" src="http://nick.typepad.com/images/basil.gif" />). And you might want to consider only permitting images to be floated, to avoid having floating DIVs, etc., causing problems.

Strip class and id attributes. If your newspaper view relies on classes and/or ids to identify items in the page, I recommend removing class and id attributes from the actual posts - otherwise a post could use the same class names that you use in your newspaper, potentially creating all kinds of havoc.

Remove top-level tags. Although they shouldn't be there, I've seen some feeds that contain top-level tags such as BODY and HTML in their posts. Imagine the impact on your river of news if some prankish feed author inserts a styled BODY tag into their feed.

If your aggregator embeds IE, get out of the local zone. This applies more to script than it does to CSS, but it bears repeating: if you're embedding the WebBrowser object, don't allow locally displayed content to operate in the local zone. If you're not sure what I'm talking about, refer to my earlier post on this topic.