There's some discussion going on an internal MSFT mailing list about blog statistics. I don't check my web statistics more than once a month, as I'm more interested in blog comments or what's going on in the forum. If I get a lot of comments on a post I feel good. I like to get discussions going and bounce ideas back and forth.

That said, some blogs at Microsoft track their statistics and need to know if a particular post or new theme brings in more readers. One particular blog (not mine) recently saw a 16x increase in "hits" which is probably a good thing. A discussion started, and here's part of an email I wrote with my ideas that I thought you might find interesting, Dear Reader. I've made a few [edits] to make things clearer.

I think it's killer, to be clear, so in no way do I want to take away from [that blog's] most excellent work, but the web stats [in this case] specifically "smells" wrong. Possibly a bot, spammer, something, but still, a 16x increase in web traffic [in a single] month feels exceptional. It's the ratios [of GETs to projected humans] that are confusing to me.

It'd be interesting to use some heuristics to turn the RSS Feed HTTP GETs into Unique users. For example, most RSS Readers poll so one individual will hit your feed (in my experience) between 8 and 16 times a day, depending on their reader and how long their computer is on. Online readers are smarter that Smart Client readers like Outlook and FeedDemon. This usually means one has fewer readers than they think, if they are looking at GETs.

Additionally, online readers [usually] only hit once (here's how that works) [and rather] "tunnel" your subscriber numbers in the HTTP User Agent like "NewsGatorOnline/2.0+(http://www.newsgator.com;+250+subscribers)". Meaning, you might get one hit or 10 hit, but regardless they are representative of 250 individuals. This usually means one has more readers than they think, if they are looking at GETs.

Why do I mention this? I mention it because looking at HTTP GETs isn't representative of people, but of GETs. It took me a few years to figure this out, and I've been thrilled with the analysis work done by FeedBurner (my RSS Feed is hosted there, saving me over 400 gigs of bandwidth a month) to turn GETs into Humans.

Here's a real world example. FeedBurner says I have around 22,000 regular readers [as of today...it varies based on weekday/weekend]. That's aggregated across all News Readers:

My stats package shows about 50,000 page views a day or about 1.6 million a month. This varies, confirming [an earlier] commentabout folks hanging around [a site] and reading stories, which is cool. However, if I look at "hits" I see 16.5 million. Of course, that's not [a useful stat], because that included images, css, etc. Visits, on the other hand are one individual hanging around for a period of time and reading. For example (these stats don't include RSS anywhere, including bandwidth):

For me, these stats make sense, because I have a readership of about 20,000 that show up every few days and hang out, representing [roughly] 50% of my traffic. The other 50% comes from Search Engines and [incoming] links from other blogs. So it's important that one distinguishes between hits, page views, and visitors, and tries to correlate those back to readership, IMHO.

The question that we need Blog Stats to answer is that of readership. What does [a] 600,000 RSS hits number mean? 600k/30days is about 20k hits a day, so how often are these readers hitting the feed per day? Once we come up with a standard-ish formula, blogs could get a rough +/-30% idea of how many human eyeballs [are actually reading].

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

These blogging statistics make as much sense as the TV’s Neilson’s Ratings. The big difference is: While the Neilson Ratings may not truly be accurate – every is using the same “inaccurate” ratings. So it’s a “fair” comparison.

Because there are so many different blogging systems – you cannot accurately compare statistics.

I’m personally use Windows Live Spaces that has next to NO statistics. So while I know I’ve had close to 4 million page views – I have little clue as to who is reading me or the the number of RSS feeds etc.

What about tracking a 1x1 gif in each post? Most feed readers display images, this would give a more accurate count of readership. Users wouldn't necessarily have to visit your site to be counted as a reader. Which i think its a good thing, because honestly I don't visit a site anymore unless I'm commenting on a post. And forget partial rss feeds to entice users to visit the site; I don't subscribe to these feeds at all.

@Monsur - good idea, except that browsers cache images meaning you'll get the count once per person, but subsequent reads won't register.

As for you, Scott, great analysis - I couldn't agree more. I love how FeedBurner helps give a much more accurate view of "readership", but even then it includes bots and such (typically falling under "other readers") so the number isn't entirely accurate.'Course for little guys like me, you can bet I'm going to claim every one of those bots as a "reader" when I talk to my blogging friends! :)

On the up-side, if *everybody* used feedburner, we'd have a much better relative number to work with that could help you compare and contrast, albeit with a true absolute number still.

The gist of the article is to provide counts about the number of current subscribers, the number of new subscribers, the number of lost subscribers, and the number of regained subscribers. FeedBurner provides some of these stats, but they don't seem to measure them in the way that I would like them to be measured (it could just be that I am not familiar enough with the data yet).

Unfortunately I don't have any clever solution as to how to gather this information (Perhaps some variation of @Monsur's idea?).

Hey Now Scott,Feedburners is great & the analytics are really nice. It makes it easier to subcribe for the user too, like when using google reader for example (my preferred reader.)Thx 4 the info,Catto

It truly would be nice if there were a standard that defined web readership but I think we are years from it. As soon as the industry starts to understand hits vs. page views vs. unique users a new technology comes along (like RSS caching sites) which invalidates the prior 'standard'. If we all used feedburner we would have standard numbers but for everyone that like a company/technology there are an equal number of 'haters' just because it's popular.

To me this is just another incarnation of what Web Developers have been dealing with since the early days, trying to take fairly abstract raw numbers and explain them in non-technical terms.

I thought I was the only one who came to your blog to hang out.... now I don't feel alone anymore.

But I think that is a testament to your knowledge and hard work that your readership is even more than 5. I have two kids, (both of which are about the same age as yours). I have a wife, a full time job, and I have barely enough time to write a good blog entry once a month... and by good, I mean more than 5 words. Keep it up Scott,

Mickey Keenan

Wednesday, 16 January 2008 16:42:39 UTC

Mickey - Thanks, that's a very nice thing for you to say. I appreciate it.

Now that I know most of your readers use Google Reader , Here is a request I have long wanted to make .

Is it possible to format the code to be google reader friendly . Looks like you have some nice <pre> tags on the webpage to format code , but when viewed as a feed , the code is almost unreadable .

This was an issue only in the older posts [Weekly source code 10 and earlier], If you have fixed it since then , thanks.

pradeep

Thursday, 17 January 2008 15:34:56 UTC

Hi Scott,

Great post. You didn't answer the most obvious question though. What is a hit? A hit is registered for every downloadable object on a page. If you have a page with 10 images and a JavaScript link and a CSS link (living on the same server), you'll have 13 hits for that one page view (one of the hits is the page itself).

So, in the early/mid 90s, the term hit was used all the time at first, but then the terminology changed and people started to use page views or visitors which better represented their actual traffic. As you mention, things have changed again with RSS readership. (in the 90s, some people were putting dozens or hundreds of tiny hidden images on their pages to bring their hit count up. . . obviously completely defeating the purpose, but it gave them bragging right for people wondering what their 'hit' count was). So, the term 'hit' is an old term that has always been unclear to most people.