Common Crawl maintains a free,open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007. We make wholesale extraction, transformation and analysis of open web data accessible to researchers.
We are pleased to announce a new release of host-level and domain-level Web Graphs based on the crawls of August, September, and October 2024. The crawls used to generate the graphs were CC-MAIN-2024-33, CC-MAIN-2024-38, and CC-MAIN-2024-42.
Thom Vaughan
Thom is Principal Technologist at the Common Crawl Foundation.