Search results

Common Crawl - Blog - Twelve steps to running your Ruby code across five billion web pages

Twelve steps to running your Ruby code across five billion web pages. The following is a guest blog post by Pete Warden, a member of the Common Crawl Advisory Board. Pete is a British-born programmer living in San Francisco.

Common Crawl - Blog - WikiReverse- Visualizing Reverse Links with the Common Crawl Archive

He mainly develops in Ruby and is interested in open data and cloud computing. This guest post describes his open data project and why he built it. Ross Fairbanks. Ross Fairbanks is a software developer based in Barcelona. What is WikiReverse?

Common Crawl - Blog - Analysis of the NCSU Library URLs in the Common Crawl Index

You can see the simple Ruby scripts I used for parsing the Common Crawl URL index out and the Web Data Commons N-Quads in. this gist. Last week. we announced the Common Crawl URL Index.