Shhh! Don’t Tell Google News You’re a Blog!

Google recently rolled out some enhancements to its Google News site, including settings that allow users to say whether they want to see more or less news from “blogs.” But how does the search giant define the term “blog?” After all, the lines between traditional media and the blogosphere have blurred a lot over the past few years, with traditional media entities launching blogs, and some blog sites becoming major media entities. According to Google, it looks at a bunch of different factors the company won’t specify, but the main one is whether a source calls itself a blog or not, just reinforcing the point that drawing a distinction between blogs and non-blogs is a mug’s game when it comes to the news.

Google first started drawing a distinction between regular news sources and blogs in 2009, but it was never really clear how the search company was defining the term “blog,” or why it included some obvious blogs but not others in that category. According to the Google blog post announcing the change, readers complained it wasn’t clear whether something was a blog or not when they did a news search. Was this important because readers felt that blog sources were less news-worthy or less reliable? Who knows. Google still hasn’t said.

Zachary Seward, who is now at the Wall Street Journal, wrote a post for the Nieman Journalism Lab at the time the original changes were made, arguing (persuasively, I think) it didn’t make much sense to draw a line between what was a blog and what wasn’t for news purposes, either from a technical standpoint or a philosophical one. As Seward put it:

Dividing content along these lines is like classifying brownies based on whether they were baked in aluminum or glass pans. There’s no difference, and it obscures what you really want know: if they contain chocolate chips.

In other words, the only real criteria that should matter when it comes to searching Google News is whether something actually, you know, contains news. M.G. Siegler makes a similar point in his recent post about the changes at Google News, which he notes has never been very good at surfacing actual technology news. What it tends to do, he says, is give precedence to mainstream news sites that report the same thing a blog reported, but several days after the fact. Is that what a news aggregator should really be doing?

According to a spokesman for Google, the search company “examines a variety of signals” from websites to determine whether they are blogs or not, but for the purposes of Google News, “we primarily rely on self-identification.” In other words, if a site has the word “blog” in its name, then Google News defines it as a blog. So since GigaOM and TechCrunch don’t use the term blog, they aren’t designated as blogs in Google News — but the News York Times Bits blog and the Wall Street Journal On Media blog are designated as blogs. If sites want to be reclassified, the spokesman said, they can contact the Publisher Support team.

That’s not all, though. As Danny Sullivan at Search Engine Land notes, Google also classifies blogs for the purposes of what it calls Google Blog Search — which you can get either from the dedicated blog-search site or by choosing “blogs” in the left-hand navigation menu on the main search page. In those results, GigaOM and TechCrunch are both classified as blogs. Why? Apparently, because they publish their content via RSS feeds, which (as Sullivan notes) means Google should really change the name of the search to Google Feed Search instead of Google Blog Search.

Presumably an RSS feed is also one of the “signals” that Google looks at when classifying blogs for Google News purposes, although that’s not explicitly stated anywhere. On the help page for the news site under blogs, it says:

Blogs typically identify themselves as such, and adhere to standard blog formatting by displaying regular entries in order from newest to oldest. In many instances, blog posts are excerpted on the blog’s homepage instead of summarized by an editor or author. Finally, websites that organize their articles in a more editorial fashion and employ a complex layout are generally not considered blogs.

So from the sounds of it, Google treats things as blogs if they either identify themselves as such, or if they have a certain design — i.e., posts in reverse chronological order and a lack of a “complex layout.” This makes no sense whatsoever. Not only are many news websites adopting a distinctly cleaner and blog-inspired layout, but some things that are clearly blogs have moved away from the chronological ranking of posts as well, such as Nick Denton’s Gawker Media network. Some, such as The Huffington Post, have a mish-mash of both news-type pages and blog pages. And why would someone expect the NYT and WSJ blogs to show up in a blog search, but not actual blog sites like GigaOM or TechCrunch?

On its help page, Google says it acknowledges “the difficulty in characterizing blogs and the rapidly changing publishing landscape,” but is trying to help readers choose which sources they want to read. So why not just include everything that actually contains news in the site called Google News, and let readers sort out what to call them?