Common Crawl builds and maintains an open crawl of the web that can be accessed and analyzed by everyone.
Our vision is of a truly open web that allows universal open access to information and enables greater innovation in research, business and education.
Common Crawl Foundation is a California 501(c)3 registered non-profit founded by Gil Elbaz with the goal of democratizing access to web information by producing and maintaining an open repository of web crawl data that is universally accessible and analyzable.
As the largest and most diverse collection of information in human history, the web grants us tremendous insight if we can only understand it better. For example, web crawl data can be used to spot trends and identify patterns in politics, economics, health, popular culture and many other aspects of life. It provides an immensely rich corpus for scientific research, technological advancement, and innovative new businesses. It is crucial for our information-based society that the web be openly accessible to anyone who desires to utilize it.
We strive to be transparent in all of our operations and we support nofollow and robots.txt. For more information about the ccBot, please see FAQ. For more information on Common Crawl data and how to access it, please see Data. For access to our open source code, please see our GitHub repository.