Common Crawl Blog

CC-Citations: A Visualization of Research Papers Referencing Common Crawl




July 2016 Crawl Archive Now Available




June 2016 Crawl Archive Now Available




May 2016 Crawl Archive Now Available




April 2016 Crawl Archive Now Available




Welcome, Sebastian!




August 2015 Crawl Archive Available




November 2015 Crawl Archive Now Available




5 Good Reads in Big Open Data: February 27 2015




Web Image Size Prediction for Efficient Focused Image Crawling




September 2015 Crawl Archive Now Available




July 2015 Crawl Archive Available




June 2015 Crawl Archive Available




5 Good Reads in Big Open Data: March 6 2015




April 2015 Crawl Archive Available




March 2015 Crawl Archive Available




Announcing the Common Crawl Index!




Evaluating graph computation systems




February 2015 Crawl Archive Available




5 Good Reads in Big Open Data: March 20 2015




5 Good Reads in Big Open Data: March 26 2015




5 Good Reads in Big Open Data: March 13 2015




Analyzing a Web graph with 129 billion edges using FlashGraph




January 2015 Crawl Archive Available




Lexalytics Text Analysis Work with Common Crawl Data




5 Good Reads in Big Open Data: Feb 13 2015




5 Good Reads in Big Open Data: Feb 20 2015




WikiReverse- Visualizing Reverse Links with the Common Crawl Archive




5 Good Reads in Big Open Data: Feb 6 2015




The Promise of Open Government Data & Where We Go Next




December 2014 Crawl Archive Available




Please Donate To Common Crawl!




November 2014 Crawl Archive Available




October 2014 Crawl Archive Available




Winter 2013 Crawl Data Now Available




Web Data Commons Extraction Framework for the Distributed Processing of CC Data




September 2014 Crawl Archive Available




August 2014 Crawl Data Available




July 2014 Crawl Data Available




March 2014 Crawl Data Now Available




April 2014 Crawl Data Available




Navigating the WARC file format




New Crawl Data Available!




Common Crawl's Move to Nutch




Hyperlink Graph from Web Data Commons




URL Search Tool!




Startup Profile: SwiftKey’s Head Data Scientist on the Value of Common Crawl’s Open Data




Professor Jim Hendler Joins the Common Crawl Advisory Board!




Strata Conference + Hadoop World




A Look Inside Our 210TB 2012 Web Corpus




Analysis of the NCSU Library URLs in the Common Crawl Index




The Norvig Web Data Science Award




The Winners of The Norvig Web Data Science Award




Common Crawl URL Index




Towards Social Discovery - New Content Models; New Data; New Toolsets




blekko donates search data to Common Crawl




Winners of the Code Contest!




Common Crawl Code Contest Extended Through the Holiday Weekend




TalentBin Adds Prizes To The Code Contest




2012 Crawl Data Now Available




Amazon Web Services sponsoring $50 in credit to all contest entrants!




Mat Kelcey Joins The Common Crawl Advisory Board




Still time to participate in the Common Crawl code contest




Big Data Week: meetups in SF and around the world




OSCON 2012




The Open Cloud Consortium’s Open Science Data Cloud




Twelve steps to running your Ruby code across five billion web pages




Common Crawl's Brand Spanking New Video and First Ever Code Contest!




Learn Hadoop and get a paper published




Data 2.0 Summit




Common Crawl's Advisory Board




Common Crawl on AWS Public Data Sets




Web Data Commons




SlideShare: Building a Scalable Web Crawler with Hadoop




Video: Gil Elbaz at Web 2.0 Summit 2011




Video: This Week in Startups - Gil Elbaz and Nova Spivack




Video Tutorial: MapReduce for the Masses




Common Crawl Enters A New Phase




Gil Elbaz and Nova Spivack on This Week in Startups




MapReduce for the Masses: Zero to Hadoop in Five Minutes with Common Crawl




Answers to Recent Community Questions
















