Common Crawl Blog

IIPC General Assembly & Web Archiving Conference 2025




IIPC General Assembly & Web Archiving Conference 2025




March/April 2025 Newsletter




Providing Authenticity & Data Provenance for Common Crawl Using Blockchain: Our Work with Constellation Network




Host- and Domain-Level Web Graphs January, February, and March 2025




March 2025 Crawl Archive Now Available




Introducing Common Crawl AI Agent by ReadyAI




Submission to the UK’s Copyright and AI Consultation




Host- and Domain-Level Web Graphs December 2024 and January/February 2025




February 2025 Crawl Archive Now Available




Opening the Gates to Online Safety




January/February 2025 Newsletter




Host- and Domain-Level Web Graphs November/December 2024 and January 2025




January 2025 Crawl Archive Now Available




Introducing cc-downloader




Host- and Domain-Level Web Graphs October, November, and December 2024




December 2024 Crawl Archive Now Available




Common Crawl Foundation at NeurIPS 2024: Expanding Horizons and Building Connections




Expanding the Language and Cultural Coverage of Common Crawl




October/November 2024 Newsletter




Host- and Domain-Level Web Graphs September, October, November 2024




November 2024 Crawl Archive Now Available




Reflections on Recent Talks at the Turing Institute and UCL




Introducing the Common Crawl Errata Page for Data Transparency




Host- and Domain-Level Web Graphs August, September, and October 2024




October 2024 Crawl Archive Now Available




White House Briefing on Open Data’s Role in Technology




IAB Workshop on AI-CONTROL




Host- and Domain-Level Web Graphs July, August, and September 2024




September 2024 Crawl Archive Now Available




August/September 2024 Newsletter




Host- and Domain-Level Web Graphs June, July, and August 2024




August 2024 Crawl Archive Now Available




The Increase of Common Crawl Citations in Academic Research




Host- and Domain-Level Web Graphs May, June, and July 2024




July 2024 Crawl Archive Now Available




Common Crawl Statistics Now Available on Hugging Face




The Environmental Impact of the Cloud - the Common Crawl Case Study




Host- and Domain-Level Web Graphs April, May, and June 2024




June 2024 Crawl Archive Now Available




Dialog and Discovery at AI_dev 2024




May/June 2024 Newsletter




Host- and Domain-Level Web Graphs February/March, April, and May 2024




May 2024 Crawl Archive Now Available




Host- and Domain-Level Web Graphs November/December 2023, February/March 2024, and April 2024




April 2024 Crawl Archive Now Available




March/April 2024 Newsletter




Host- and Domain-Level Web Graphs September/October, November/December 2023 and February/March 2024




February/March 2024 Crawl Archive Now Available




Web Archiving File Formats Explained




A Further Look Into the Prevalence of Various ML Opt–Out Protocols




Balancing Discovery and Privacy: A Look Into Opt–Out Protocols




Host- and Domain-Level Web Graphs May/Sep/Nov 2023




November/December 2023 Crawl Archive Now Available




Oct/Nov 2023 Performance Issues




Host- and Domain-Level Web Graphs Mar/May/Oct 2023




September/October 2023 crawl archive now available




Bridging Digital Exploration and Scientific Frontiers




May/June 2023 crawl archive now available




March/April 2023 crawl archive now available




Host- and Domain-Level Web Graphs September/October, November/December 2022 and January/February 2023




January/February 2023 crawl archive now available




September/October 2022 crawl archive now available




Host- and Domain-Level Web Graphs February/March, April and May 2021




November/December 2022 crawl archive now available




June/July 2022 crawl archive now available




Host- and Domain-Level Web Graphs May, June/July and August 2022




August 2022 crawl archive now available




July/August 2021 crawl archive now available




May 2022 crawl archive now available




Host- and Domain-Level Web Graphs October, November/December 2021 and January 2022




January 2022 crawl archive now available




Introducing CloudFront as a new way to access Common Crawl data as part of Amazon Web Services’ registry of open data




November/December 2021 crawl archive now available




October 2021 crawl archive now available




Host- and Domain-Level Web Graphs June, July/August and September 2021




September 2021 crawl archive now available




April 2021 crawl archive now available




June 2021 crawl archive now available




May 2021 crawl archive now available




February/March 2021 crawl archive now available




Host- and Domain-Level Web Graphs October, November/December 2020 and January 2021




Host- and Domain-Level Web Graphs Jul/Aug/Sep 2020




November/December 2020 crawl archive now available




January 2021 crawl archive now available




October 2020 crawl archive now available




Interactive Webgraph Statistics Notebook Released




September 2020 crawl archive now available




August 2020 crawl archive now available




July 2020 crawl archive now available




Host- and Domain-Level Web Graphs Feb/Mar/May 2020




May/June 2020 crawl archive now available




February 2020 crawl archive now available




March/April 2020 crawl archive now available




Host- and Domain-Level Web Graphs Nov/Dec/Jan 2019 – 2020




January 2020 crawl archive now available




December 2019 crawl archive now available




Host- and Domain-Level Web Graphs May/June/July 2019




November 2019 crawl archive now available




Host- and Domain-Level Web Graphs Aug/Sep/Oct 2019



