6 de marzo de 2011

In January, Google promised that it would take action against content farms that were gaining top listings with “shallow” or “low-quality” content. Now the company is delivering, announcing a change to its ranking algorithm designed take out such material.
New Change Impacts 12% Of US Results

The new algorithm — Google’s “recipe” for how to rank web pages — starting going live yesterday, the company told me in an interview today.

Google changes its algorithm on a regular basis, but most changes are so subtle that few notice. This is different. Google says the change impacts 12% (11.8% is the unrounded figure) of its search results in the US , a far higher impact on results than most of its algorithm changes. The change only impacts results in the US. It may be rolled out worldwide in the future.

While Google has come under intense pressure in the past month to act against content farms, the company told me that this change has been in the works since last January.
Officially, Not Aimed At Content Farms

Officially, Google isn’t saying the algorithm change is targeting content farms. The company specifically declined to confirm that, when I asked. However, Matt Cutts — who heads Google’s spam fighting team — told me, “I think people will get the idea of the types of sites we’re talking about.”

Well, there are two types of sites “people” have been talking about in a way that Google has noticed: “scraper” sites and “content farms.” It mentioned both of them in a January 21 blog post:

We’re evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content. We’ll continue to explore ways to reduce spam, including new ways for users to give more explicit feedback about spammy and low-quality sites.

As “pure webspam” has decreased over time, attention has shifted instead to “content farms,” which are sites with shallow or low-quality content.

I’ve bolded the key sections, which I’ll explore more next.
The “Scraper Update”

About a week after Google’s post, Cutts confirmed that an algorithm change targeting “scraper” sites had gone live:

This was a pretty targeted launch: slightly over 2% of queries change in some way, but less than half a percent of search results change enough that someone might really notice. The net effect is that searchers are more likely to see the sites that wrote the original content rather than a site that scraped or copied the original site’s content.

“Scraper” sites are those widely defined as not having original content but instead pulling content in from other sources. Some do this through legitimate means, such as using RSS files with permission. Others may aggregate small amounts of content under fair use guidelines. Some simply “scrape” or copy content from other sites using automated means — hence the “scraper” nickname.

In short, Google said it was going after sites that had low-levels of original content in January and delivered a week later.

By the way, sometimes Google names big algorithm changes, such as in the case of the Vince update. Often, they get named by WebmasterWorld, where a community of marketers watches such changes closely, as happened with last year’s Mayday Update.

In the case of the scraper update, no one gave it any type of name that stuck. So, I’m naming it myself the “Scraper Update,” to help distinguish it against the “Farmer Update” that Google announced today.
But “Farmer Update” Really Does Target Content Farms

“Farmer Update?” Again, that’s a name I’m giving this change, so there’s a shorthand way to talk about it. Google declined to give it a public name, nor do I see one given in a WebmasterWorld thread that started noticing the algorithm change as it rolled out yesterday, before Google’s official announcement.

Postscript: Internally, Google told me this was called the “Panda” update, but they didn’t want that on-the-record when I wrote this original story. About a week later, they revealed the internal name in a Wired interview.

How can I say the Farmer Update targets content farms when Google specifically declined to confirm that? I’m reading between the lines. Google previously had said it was going after them.

Since Google originally named content farms as something it would target, you’ve had some of the companies that get labeled with that term push back that they are no such thing. Most notable has been Demand Media CEO Richard Rosenblatt, who previously told AllThingsD about Google’s planned algorithm changes to target content farms:

It’s not directed at us in any way.

I understand how that could confuse some people, because of that stupid “content farm” label, which we got tagged with. I don’t know who ever invented it, and who tagged us with it, but that’s not us…We keep getting tagged with “content farm”. It’s just insulting to our writers. We don’t want our writers to feel like they’re part of a “content farm.”

I guess it all comes down to what your definition of a “content farm” is. From Google’s earlier blog post, content farms are places with “shallow or low quality content.”

In that regard, Rosenblatt is right that Demand Media properties like eHow are not necessarily content farms, because they do have some deep and high quality content. However, they clearly also have some shallow and low quality content.

That content is what the algorithm change is going after. Google wouldn’t confirm it was targeting content farms, but Cutts did say again it was going after shallow and low quality content. And since content farms do produce plenty of that — along with good quality content — they’re being targeted here. If they have lots of good content, and that good content is responsible for the majority of their traffic and revenues, they’ll be fine. In not, they should be worried.
More About Who’s Impacted

As I wrote earlier, Google says it has been working on these changes since last January. I can personally confirm that several of Google’s search engineers were worrying about what to do about content farms back then, because I was asked about this issue and thoughts on how to tackle it, when I spoke to the company’s search quality team in January 2010. And no, I’m not suggesting I had any great advice to offer — only that people at Google were concerned about it over a year ago.

Since then, external pressure has accelerated. For instance, start-up search engine Blekko blocked sites that were most reported by its users to be spam, which included many sites that fall under the content farm heading. It gained a lot of attention for the move, even if the change didn’t necessarily improve Blekko’s results.

In my view, that helped prompt Google to finally push out a way for Google users to easily block sites they dislike from showing in Google’s results, via Chrome browser extension to report spam.

Cutts, in my interview with him today, made a point to say that none of the data from that tool was used to make changes that are part of the Farmer Update. However, he went on to say that of the top 50 sites that were most reported as spam by users of the tool, 84% of them were impacted by the new ranking changes. He would not confirm or deny if Demand’s eHow site was part of that list.

“These are sites that people want to go down, and they match our intuition,” Cutts said.

In other words, Google crafted a ranking algorithm to tackle the “content farm problem” independently of the new tool, it says — and it feels like tool is confirming that it’s getting the changes right.
The Content Farm Problem

By the way, my own definition of a content farm that I’ve been working on is like this:

* Looks to see what are popular searches in a particular category (news, help topics)
* Generates content specifically tailored to those searches
* Usually spends very little time and or money, even perhaps as little as possible, to generate that content

The problem I think content farms are currently facing is with that last part — not putting in the effort to generate outstanding content.

For example, last night I did a talk at the University Of Utah about search trends and touched on content farm issues. A page from eHow ranked in Google’s top results for a search on “how to get pregnant fast,” a popular search topic. The advice:

The class laughed at the “Enjoyable Sex Is Key” advice as the first tip for getting pregnant fast. Actually, the advice that you shouldn’t get stressed makes sense. But this page is hardly great content on the topic. Instead, it seems to fit the “shallow” category that Google’s algorithm change is targeting. And the page, there last night when I was talking to the class, is now gone.

Perhaps the new “curation layer” that Demand talked about in it earnings call this week will help in cases like these. Demand also defended again in that call that it has quality content.

Will the changes really improve Google’s results? As I mentioned, Blekko now automatically blocks many content farms, a move that I’ve seen hailed by some. What I haven’t seen is any in-depth look at whether what remains is that much better. When I do spot checks, it’s easy to find plenty of other low quality or completely irrelevant content showing up.

Cutts tells me Google feels the change it is making does improve results according to its own internal testing methods. We’ll see if it plays out that way in the real world.