CCBot
Common Crawl is a non-profit foundation founded with the goal of democratizing access to web information by producing and maintaining an open repository of web crawl data that is universally accessible and analyzable by anyone.
Enabling free access to web crawl data encourages collaboration and interdisciplinary research, as organizations, academia, and non-profits can work together to address complex challenges. Collaborating using Open Data accelerates progress and helps find solutions to pressing global issues, such as climate change, public health, and social equality.
By embracing Open Data, we promote an inclusive and thriving knowledge ecosystem, where the collective intelligence of the global community can lead to transformative discoveries and positive societal impact.
To prevent Common Crawl from crawling your website, include the following in your robots.txt:
Enabling free access to web crawl data encourages collaboration and interdisciplinary research, as organizations, academia, and non-profits can work together to address complex challenges. Collaborating using Open Data accelerates progress and helps find solutions to pressing global issues, such as climate change, public health, and social equality.
By embracing Open Data, we promote an inclusive and thriving knowledge ecosystem, where the collective intelligence of the global community can lead to transformative discoveries and positive societal impact.
To prevent Common Crawl from crawling your website, include the following in your robots.txt:
User-agent: CCBot
Disallow: /
Disallow: /
Please note that we are aware of crawlers falsely identifying themselves as CCBot. We recommend verifying UserAgent strings to ensure authenticity.
CCBot is now run on dedicated IP address ranges with reverse DNS. This allows webmasters to verify whether a logged request stems from the real CCBot, for example:
CCBot is now run on dedicated IP address ranges with reverse DNS. This allows webmasters to verify whether a logged request stems from the real CCBot, for example:
$> host 18.97.14.84
84.14.97.18.in-addr.arpa domain name pointer 18-97-14-84.crawl.commoncrawl.org.
$> host 18-97-14-84.crawl.commoncrawl.org
18-97-14-84.crawl.commoncrawl.org has address 18.97.14.84
$> dig -x 18.97.14.84
;; ANSWER SECTION:
84.14.97.18.in-addr.arpa. 276 IN PTR 18-97-14-84.crawl.commoncrawl.org.
$> dig 18-97-14-84.crawl.commoncrawl.org A
;; ANSWER SECTION:
18-97-14-84.crawl.commoncrawl.org. 275 IN A 18.97.14.84
Please see our FAQ for further information.
