This is a weblog related to community activities going on in Brighton, MA, one of the neighborhoods of Boston. Ours is a large and diverse community including many long-term residents, recent immigrants, and students attending the local universities.

Thursday, August 16, 2007

Boston is Bloggiest City -- According to Lousy, Statistically-Invalid Sampling

Boston is apparently the "bloggiest" city in the country, according to a "study" by outside.in, a blogging clearinghouse, as reported in the Boston Globe:

OutsideIn.com said it tracks blogging activity in about 60 urban areas. It based its rankings on a "blogging quotient" that factored in a metropolitan area's population with the number of blog posts tied to specific locations.

By that measure, Greater Boston had 89 posts per 100,000 residents, edging out Greater Philadelphia, which had 88 posts.

What is wrong with this statistic? Here in Allston-Brighton we have nearly 70,000 residents as of the 2000 census. I sampled a bunch of blogs I know about in Allston-Brighton, then counted the number of blog posts in July 2007 -- or a full month period starting with the first post remaining in my aggregator's cache, or a full month estimated based on only a partial month's cache. I believe these all to be A-B blogs, based on how their locations were listed on other websites. Mis-attribution of the location is likely unimportant, since the total numbers of blog posts are dominated by a few near the top which are known to be A-B blogs about A-B.

outside.in's rate of 89 posts per 100,000 people is low by a factor of at least seven in Allston-Brighton! It's probably off by a much larger number if you factor in all the blogs I don't read or even know about.

Note: I did not count the outside.in RSS feed for Brighton, since it only repeats verbatim other content. But I did count Universal Hub for Brighton, since they add comments to their posts about other Brighton-related content.

The big problem with outside.in's statistical sampling? They require: (1) a blog to be entered into their database; (2) each blog post to have specific information to pinpoint its location. Brighton Centered is entered for #1, but only a (small) fraction of Brighton Centered's posts gets picked up as "Brighton" content by outside.in. Harry Mattison seems to have a higher hit rate with his Allston blog. (Maybe he'll pass along advice!)

A couple of weeks ago, I experimented with using the "georss" tag appended to all my posts, but my aggregator (reader) didn't recognize the tag (nor did Google Reader), and outside.in still didn't seem to pick it up properly. In order to get postings correctly identified by outside.in, I find it necessary to provide a full street address in the body of the post. But for many posts, it's just silly and clumsy to try to enter a street address, even though the content is clearly centered on Brighton.

Many bloggers just haven't done step #1 above. That requirement creates a systematic error in their methodology, since some regions of the country may rely on (or visit) outside.in's service at far higher rates than other regions of the country. Other bloggers fail to use location-specific tags (#2 above). This is also a systematic error in their methodology, since usage rates of location-specific tags -- and bloggers' knowledge of how to use them -- can easily vary by region of the country and education background. Those two systematic errors both systematically bias their statistical sub-sample, although the trends introduced by the biases are unknown without further investigation.

Boston may be a bloggy place, but outside.in's statistical sampling -- which may be systematically biased -- is not definitive proof. They are only picking up an unrepresentative tip of the iceberg and extrapolating from there.