Sunday, November 25, 2007

How Does FeedDemon Calculate Attention?

"I'd be interested in more detail on how you compute the scores [which determine a feed's attention]. Nothing that gives away your competitive edge of course but just some generalizations of what you are tracking that amounts to attention."

FeedDemon's algorithm for determining a feed's attention rank has changed since I first wrote about it, but it's still very simple. I certainly don't think I'll be giving away any competitive edge by posting details, so here it is:

One of Paul's concerns was that high output blogs which he skims through without reading would get ranked too highly. I attempt to counteract this in several ways, with admittedly mixed success. The most obvious way is by giving post visits the lowest weight in the algorithm (NumPostVisits div 5). And I give the highest weight to actions such as flagging, clipping or emailing a post, since those actions are proof that you find the post valuable.

One potentially important thing that's missing here is that I don't "decay" attention over time, but in reality this happens automatically. For example, if you stop paying attention to a feed that has a high attention rank, its rank will stop increasing, whereas the rank of feeds you do still pay attention to will continue to increase.

This is illustrated by the screenshot from my recent post about the attention report in FeedDemon 2.6, which shows that I was paying the most attention to the feed for the TopStyle Support Forum (since TopStyle 3.5 was in beta at the time). Now that TopStyle 3.5 has been released and I'm working on FeedDemon 2.6, the TopStyle feed has fallen to second place behind my feed for the FeedDemon Support Forum:

I'm curious as to how accurate FeedDemon customers find the new attention report. Does it for the most part reflect the attention you're paying to your feeds, or do you find it wildly out of sync with the feeds you're really paying attention to?

I find it to be pretty accurate, but there are 2 items that do seem skewed.

I have a few high-post feeds that I only subscribe to so I can run a watch on them to extract the minuscule number of items I am interested in from that feed. Those feeds seem to be skewed higher because of the watch, when in actuality I never check the feed itself or the other 99.9% of them items in it.

The other thing that seems skewed too low is feeds that don't publish very often. I have a few that FD says I "pay the least attention to." Actually, I read every post in those feeds, they just publish on a very infrequent basis. In one sense I pay a lot of attention to them, just not very often.

@Marcus: when the attention integers overflow, I come out with a new version and charge an astronomical upgrade fee ;) Seriously, these numbers are 64-bit values, so it would take a lifetime for them to overflow.

@Peter: you may want to simply exclude your "watch" feeds from the attention collection. To do this, go to each feed's properties and turn off the "Collection attention data for this feed" checkbox.

I'm not sure yet how to handle low-traffic feeds that you value highly, though. At one point I experimented with including a feed's posting frequency in the calculation, but this skewed down high-traffic feeds in a way I wasn't happy with.

Why did you go with an over/under scoring system (mult/div) instead of giving each element a multiplier?

What about using ratios as a way to further normalize the posting volume of feeds? i.e. If I read every single post from a blog then that ratio would be 1 otherwise it would be something like .4. I guess the easiest way to do that would be to also track NumPosts and divide each metric by that.

@Jackson: Honestly, I'm not sure why I went with mul/div - that algorithm was simply the one that seemed to bring accurate results in my testing, so I stuck with it.

I could use a ratio to counter high-volume feeds, but I'm not convinced that would be any better. If we're measuring per-post importance, it would make sense to take the ratio of NumPostsRead/NumPostsTotal into account. But isn't attention really more a measurement of time than it is importance?

Also, I'm subscribed to a number of high-volume feeds whose posts I usually skip over unless I spot one that looks interesting, yet I still consider these feeds important.

My only problem with the attention report is the same as Peter - I still use the generic NG Forum feed, and a watch for FeedDemon posts. The attention rank for that feed is so high it seems like everything is counted twice (once when I mark each item as read in the watch, and again when I do a "mark feed as read" without even viewing the feed newspaper).

@Nick That makes sense. With statzen I am looking at attention given to each individual post (but then I am looking at attention from the other side). I take into account the tags/categories for the posts. So, the high volume feeds that you usually skip over may have certain tags/categories that usually catch your eye.

Granted, that wouldn't help with sorting feeds by attention as much as it would help with sorting individual posts by attention in a "river of news" view.

I like your take that attention is a measure of time and not of importance, therefore applying attention information is kinda like predicting the future. i.e. sorting feeds by attention is saying "these are the feeds you will want to read first". If you ever sort posts by attention it could also help with the "panic button" situation by predicting that "If you only read 1 post from the 5 new posts in this feed, it would likely be this one".

Of course, diminishing returns apply.

BTW, I am thinking I might start reading feeds in Parallels just to play with the features you talk about here.

@Andrew: marking a feed as read shouldn't impact its attention score, unless you first click on the feed (since that would count as an "explicit feed visit"). And marking an item as read in a watch also won't impact the feed's attention score.

My biggest problem with this is that I have a number of high-volume feeds (support feeds) that trigger watches which makes them list as having high attention. What would be great would be for us to be able to exclude/include feeds/groups of feeds from/in watches. This at least would solve my issues.