Common Crawl maintains a free,open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007. We make wholesale extraction, transformation and analysis of open web data accessible to researchers.
We are pleased to announce that the crawl archive for April 2026 is now available, containing 2.19 billion web pages or 379.2 TiB of uncompressed content.