< Back to Blog
September 15, 2025

Trip Report: AI_dev (Linux Foundation) August 2025

Note: this post has been marked as obsolete.
On the 28th and 29th of August 2025, Thom Vaughan, Pedro Ortiz Suarez, and Thijs Dalhuijsen attended the Linux Foundation’s AI_dev event in Amsterdam.
Thom Vaughan
Thom Vaughan
Thom is a Principal Engineer at the Common Crawl Foundation.

On the 28th and 29th of August 2025, Thom Vaughan, Pedro Ortiz Suarez, and Thijs Dalhuijsen attended the Linux Foundation’s AI_dev event in Amsterdam. We caught up with many familiar faces and made friends with plenty of new ones.

We talked with the folks from OpenSearch, Neo4j, LFAI & Data (and we staffed their booth for a few hours on Friday).  The Neo4j developer relations team are very eager to help us with our initiative to use Neo4j with our Web Graphs.

A photo of Thom and Pedro with folks from Eventual.Inc’s team
Left-to-right: Thom Vaughan, Colin Ho, Sammy Sidhu, and Pedro Ortiz Suarez. “Oh wow, you’re from Common Crawl?!”

We heard from people working in the AI world who use our data regularly, amongst whom Sammy Sidhu and Colin Ho from Eventual.Inc (who are authors of Daft) who reached out to us to tell us that our dataset was amongst the most popular in their user-base.

A photo with (left-to-right) Thijs Dalhuijsen, Pedro Ortiz Suarez, Brewster Kahle, Stefano Maffuli, Thom Vaughan at lunch with Brewster and Stefano on day one of AI_dev
Lunch with Brewster and Stefano on day one, left-to-right: Thijs Dalhuijsen, Pedro Ortiz Suarez, Brewster Kahle, Stefano Maffuli, Thom Vaughan

We met once again with Stefano Maffuli of the Open Source Initiative, and Brewster Kahle (Internet Archive), and on day two we were invited to the opening celebration of the Internet Archive’s new Amsterdam address, where we had a chance to hang out with Brewster, his wife Mary Austin, Stefano Maffuli, Beatrice Murch, Ben Cerveny, and several others.

A photo of the folks who attended the opening of the Internet Archive’s European HQ in Amsterdam
Opening of the Internet Archive’s European HQ in Amsterdam

We are grateful to our collaborators and friends across the open source and AI communities for their ongoing support and encouragement. Over the course of two days we shared ideas, compared notes on challenges, and discovered new ways our work is being used in research and industry.

A big thank you to the Linux Foundation for hosting an excellent event, and to everyone who stopped to talk with us, share feedback, or suggest ways to work together. We are already looking forward to next year’s edition.

This release was authored by:
Thom is a Principal Engineer at the Common Crawl Foundation.
Thom Vaughan
Thijs Dalhuijsen is a Senior Software Engineer at Common Crawl. He works on backend systems, automation, and data infrastructure to power large-scale web access and analysis.
Thijs Dalhuijsen

Erratum: 

Content is truncated

Originally reported by: 
Permalink

Some archived content is truncated due to fetch size limits imposed during crawling. This is necessary to handle infinite or exceptionally large data streams (e.g., radio streams). Prior to March 2025 (CC-MAIN-2025-13), the truncation threshold was 1 MiB. From the March 2025 crawl onwards, this limit has been increased to 5 MiB.

For more details, see our truncation analysis notebook.