| CommonCrawl

Homepage

Common Crawl is a non-profit foundation dedicated to providing an open repository of web crawl data that can be accessed and analyzed by everyone.

 

Fresh Data Available!

The latest dataset is from March 2014, contains approximately 2.8 billion webpages and is located

in Amazon Public Data Sets at /common-crawl/crawl-data/CC-MAIN-2014-10.