Common Crawl maintains a free,open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007. We make wholesale extraction, transformation and analysis of open web data accessible to researchers.
The Common Crawl Foundation team took part in the United Nations Open Source Week in New York City this June, meeting with global developers, researchers, and policymakers to discuss all things open source and AI.
Common Crawl Foundation
Common Crawl builds and maintains an open repository of web crawl data that can be accessed and analyzed by anyone.