Common Crawl is a non-profit foundation founded with the goal of democratizing access to web information by producing and maintaining an open repository of web crawl data that is universally accessible and analyzable by anyone.

Enabling free access to web crawl data encourages collaboration and interdisciplinary research, as organizations, academia, and non-profits can work together to address complex challenges. Collaborating using Open Data accelerates progress and helps find solutions to pressing global issues, such as climate change, public health, and social equality.

By embracing Open Data, we promote an inclusive and thriving knowledge ecosystem, where the collective intelligence of the global community can lead to transformative discoveries and positive societal impact.

To prevent Common Crawl from crawling your website, include the following in your robots.txt:
User-agent: CCBot
Disallow: /

Please see our FAQ for further information.

The CCBot robot diligently sorts vast amounts of data