Blekko AdSpam Algorithm Finds, Eliminates Spam

Blekko is touting “the first search algorithm ever created to find spam rather than rank results,” AdSpam. Its purpose is to detect spammy web pages signals (e.g., pages with multiple display ad positions and very little content).

While Blekko hasn’t yet become a household name (it reported 30 million queries in January), the startup search engine’s emphasis on spam and search quality continues to make waves, and is obviously pushing Google to clean up its own results. While Google’s Panda update demoted “low-quality” sites, Blekko announced that is has completely eliminated 1.1 million domains from its search results, such as cheap-refrigerators.net, best-weddinggifts and Boston.diningguide.com.

“Google didn’t actually take anyone out, they just reshuffled the deck,” Rich Skrenta, chief executive of Blekko, told the New York Times. “Instead of demoting these sites to No. 5 or No. 7, we’re just throwing them out.”

Blekko’s attitude is that better search results come from killing spam, not crawling it. Not long ago, Blekko began its war on spam by purging its search results of 20 content farms.

“If you make a machine to print money [an ad network], people will exploit it,” Skrenta said. “All you have to do is put some words on your page, do some link building and get listed on search engines. Then traffic will come and checks will come, and lo and behold, most of the people who did that are not substantive sites.”

And just as Blekko uses human curators to influence results, Google has turned to its users — first by introducing a Chrome extension, and then last week extending the ability to block search results from entire domains to signed in Google users. We also found out that a piece of the Panda update included Google seeking human raters as a check on their results.

Adding humans to the mix is a good move, writes ClickZ’s Sean Carton in “Move Over Google.”

“Humans are (for the time being) the real answer to search engine relevance. It can’t just be done algorithmically, because computers don’t do a very good job determining intention and real relevance…at least not right now.”