Common Crawl maintains a free,open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007. We make wholesale extraction, transformation and analysis of open web data accessible to researchers.
We aim to enhance linguistic diversity in our dataset by inviting community contributions of non-English URLs and collaborating with MLCommons on a Language Identification campaign.
Pedro Ortiz Suarez
Pedro is a French-Colombian mathematician, computer scientist and researcher. He holds a PhD in computer science and Natural Language Processing from Sorbonne Université.