Common Crawl maintains a free,open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007.  We make wholesale extraction, transformation and analysis of open web data accessible to researchers.
We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of August, September, and October 2025, consisting of of 468.4 million nodes and 8.0 billion edges at the host level, and 97.7 million nodes and 6.0 billion edges at the domain level.
Hande Çelikkanat
Hande is a Senior ML Engineer with the Common Crawl Foundation.