Google's Great Spam Quest

Google's Great Spam Quest

Google is working on ways to rid its search results of “content farms”—sites that create many pages of very cheap content crafted to appear high up in Google’s results. Speaking this week at Farsight 2011, a one-day event in San Francisco on the future of search, the firm’s principal search engineer, Matt Cutts, said that Google is considering tweaks to the algorithms that guide its search results. It’s also considering more radical tactics, such as letting users blacklist certain sites from the results they see.

In recent months, Google has been criticized by tech industry insiders for allowing so-called “content farms” to occupy high rankings in results for common searches. The operators of such sites create articles containing common search keywords and phrases as a way of luring visitors to their online ads. Much of the content on such sites, for example those operated by Demand Media, is created by very low-paid freelancers.

Search engines are currently being bested by those tactics, said Vivek Wadhwa, a visiting researcher in technology and business at Berkeley, Duke, and Harvard universities, at Tuesday’s event. “Over the last 15 years, search has changed very little,” he said, “but the Web has changed and become pretty clogged by spam.” Wadhwa said he realized the scale of the problem after small-scale experiments with his students revealed that the shortcomings in Google searches appeared frequently for common searches.

Cutts announced last week that Google’s algorithms had been altered to penalize sites that copied content from other sites as a way of climbing higher in search rankings. But he acknowledged that it was a challenge to identify and demote low-quality content. “Someone recently found five articles on how to tie shoes on one of these sites,” he said. “We want to find an algorithmic solution to this and are working on it.”

Some question whether an algorithmic approach can work. Startup search company Blekko uses a different approach: yesterday it announced that it had excluded 20 “spam” sites from its index entirely, based on which pages its users had marked as spam when they appeared in search results. The 20 sites include many often described as content farms, including Demand Media’s eHow site. Blekko, which launched last November, uses Wikipedia-like functionality to allow users to mark pages as spam, and to work together on filters (dubbed “slashtags”) that include or exclude sites from searches on particular topics.