We crawl the web.
You crunch the data.

Free web crawl data archive from June 2017

3.16

billion URLs

250+

terrabytes uncompressed

71,839

WARC, WAT, and WET files

Access these datasets

Common Crawl makes it possible for researchers, data scientists, developers and entrepreneurs to explore and model ideas using massive quantities of recent high-quality web data.

We are the nonprofit that provides free web crawl data for anyone to access and analyze. Our vast and growing open repository contains nine years of web page data, available at no cost.

Big data leads to big discoveries. People around the world use Common Crawl data to advance new ideas, research and learning. Discover something new with Common Crawl.

Partners & supporters of Common Crawl include...

Support our work. Make a donation so that we can continue to provide access to web data.

The Common Crawl Foundation is a registered 501(c)(3) nonprofit. Donations are tax deductible.

Donate Now