Search results

Common Crawl - Blog - February 2025 Crawl Archive Now Available

February 2025 Crawl Archive Now Available. The crawl archive for February 2025 is now available. The data was crawled between February 6th and February 20th, and contains 2.6 billion web pages (or 402 TiB of uncompressed content).…

Common Crawl - Blog - December 2024 Crawl Archive Now Available

December 2024 Crawl Archive Now Available. The crawl archive for December 2024 is now available. The data was crawled between December 1st and December 15th, and contains 2.64 billion web pages (or 394 TiB of uncompressed content).…

Common Crawl - Blog - March 2026 Crawl Archive Now Available

March 2026 Crawl Archive Now Available. We are pleased to announce the release of the March 2026 crawl, containing 1.97 billion web pages, or 344.64 TiB of uncompressed content.…

Common Crawl - Blog - Common Crawl Statistics Now Available on Hugging Face

Common Crawl Statistics Now Available on Hugging Face. We're excited to announce that Common Crawl’s statistics are now available on Hugging Face! Ford Heilizer. Ford is an emeritus member of the Common Crawl Foundation.…

Common Crawl - Blog - February 2019 crawl archive now available

February 2019 crawl archive now available. The crawl archive for February 2019 is now available! It contains 2.9 billion web pages or 225 TiB of uncompressed content, crawled between February 15th and 24th. Sebastian Nagel.…

Common Crawl - Blog - July 2024 Crawl Archive Now Available

July 2024 Crawl Archive Now Available. We are pleased to announce that the crawl archive for July 2024 is now available, containing 2.5 billion web pages, or 360 TiB of uncompressed content. Thom Vaughan.…

Common Crawl - Blog - April 2018 Crawl Archive Now Available

April 2018 Crawl Archive Now Available. The crawl archive for April 2018 is now available! The archive contains 3.1 billion web pages and 230 TiB of uncompressed content, crawled between April 19th and 27th. Sebastian Nagel.…

Common Crawl - Blog - November/December 2021 crawl archive now available

November/December 2021 crawl archive now available. The crawl archive for November/December 2021 is now available! The data was crawled Nov 26 – Dec 9 and contains 2.5 billion web pages or 280 TiB of uncompressed content.…

Common Crawl - FAQ

Common Crawl. General Questions. What is Common Crawl?…

Common Crawl - Erratum - ARC Format (Legacy) Crawls

ARC Format (Legacy) Crawls. Our early crawls were archived using the ARC (Archive) format, not the WARC (Web ARChive) format. The ARC format, which predates WARC, was the initial format used for storing web crawl data.…

Search results

The Data

Resources

Community

About