blekko donates search data to Common Crawl

I am very excited to announce that blekko is donating search data to Common Crawl! 

blekko was founded in 2007 to pursue innovations that would eliminate spam in search results. blekko has created a new type of search experience that enlists human editors in its efforts to eliminate spam and personalize search. blekko has raised $55 million in VC and currently has 48 employees, including former Google and Yahoo! Search engineers. 

For details of their donation and collaboration with Common Crawl see the post from their blog below. Follow blekko on Twitter and subscribe to their blog to keep abreast of their news (lots of cool stuff going on over there!) and be sure to check out there search. 

From the blekko blog:




At blekko, we believe the web and search should be open and transparent — it’s number one in the blekko Bill of Rights. To make web data accessible, blekko gives away our search results to innovative applications using our API. Today, we’re happy to announce the ongoing donation of our search engine ranking metadata for 140 million websites and 22 billion webpages to the Common Crawl Foundation.

Common Crawl has built an open crawl of the web that can be accessed and analyzed by everyone. The goal is building a truly open web, with open access to information that enables more innovation in research, business, and education. Common Crawl will use blekko’s metadata to improve its crawl quality, while avoiding webspam, porn, and the influence of excessive SEO (search engine optimization). This will ensure that Common Crawl’s resources and engineering time are spent on webpages that are written by, and are useful to, humans.

We’re putting our full-fledged support behind Common Crawl’s crawl and mission with this donation. We’re not doing this because it makes us feel good (OK, it makes us feel a little good), or because it makes us look good (OK, it makes us look a little good), we’re helping Common Crawl because Common Crawl is taking strides towards our shared vision of an open and transparent Internet.

Just take a look at this excerpt from Common Crawl’s website:

“As the largest and most diverse collection of information in human history, the web grants us tremendous insight if we can only understand it better. For example, web crawl data can be used to spot trends and identify patterns in politics, economics, health, popular culture and many other aspects of life. It provides an immensely rich corpus for scientific research, technological advancement, and innovative new businesses. It is crucial for our information-based society that the web be openly accessible to anyone who desires to utilize it.”

Who could disagree with that?






  1. Y K says:

    This is amazing! I love seeing the project growing. Is there anyway I can help?

  2. Robert Meusel says:

    Great … more data to work with. Do you already know, when it will be available to download from S3?

  3. Siddharth Mathur says:

    Search Engine Ranking,This is amazing! I love seeing the project growing. Is there anyway I can help out.

  4. Axat Tech says:

    Awesome, I like to search new thing and happy to read this good blog. continue growing is best in search engine ranking

  5. Ty Karny says:

    Great, this will level the field a bit for new ventures.