A person standing and observing a waterfall of data, with computer screens in the periphery

Latest Crawl - Archive Location & Download

The latest crawl is:

CC-MAIN-2024-42

To assist with exploring and using the dataset, we provide gzipped files which list all segments, WARC, WAT and WET files.

By simply adding either s3://commoncrawl/ or https://data.commoncrawl.org/ to each line, you end up with the S3 and HTTP paths respectively.

Learn how to Get Started.