We crawl the web.
You crunch the data.
Free web crawl data archive from June 2017
3.16
billion URLs
250+
terrabytes uncompressed
71,839
WARC, WAT, and WET files
Common Crawl makes it possible for researchers, data scientists, developers and entrepreneurs to explore and model ideas using massive quantities of recent high-quality web data.
We are the nonprofit that provides free web crawl data for anyone to access and analyze. Our vast and growing open repository contains nine years of web page data, available at no cost.
Big data leads to big discoveries. People around the world use Common Crawl data to advance new ideas, research and learning. Discover something new with Common Crawl.
Partners & supporters of Common Crawl include...

Support our work. Make a donation so that we can continue to provide access to web data.
The Common Crawl Foundation is a registered 501(c)(3) nonprofit. Donations are tax deductible.
Donate NowAccess to data is a good thing, right?
Please donate today, so we can continue to provide you and others like you with this priceless resource.
Don't forget, Common Crawl is a registered 501(c)(3) non-profit so your donation is tax deductible!