Common Crawl maintains a free,open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007. We make wholesale extraction, transformation and analysis of open web data accessible to researchers.
Stanford HAI and Common Crawl are joining forces to explore how open data can shape the future of AI. On 22 October 2025, their seminar will address privacy, safety, and security while showcasing new ways to preserve and share humanity’s knowledge.
Common Crawl Foundation
Common Crawl builds and maintains an open repository of web crawl data that can be accessed and analyzed by anyone.