From the 20th to the 25th of July I had the happy opportunity to attend IETF 123, held at the Meliã Castilla in Madrid. As ever, the event was packed full of discussions, new draft proposals, and connections from the Internet protocol community. Here are some favourites:

AIPREF Working Group
This group is very relevant to our activities at Common Crawl, and we’ve been active in it since before its charter. Following a productive Design Team meeting in London the previous week, the AIPREF Working Group (whose charter is to produce a set of “building blocks that allow for the expression of preferences about how content is collected and processed for Artificial Intelligence (AI) model development, deployment, and use”) was a productive and quite encouraging meeting. The Vocabulary and Attachment documents remain under active development, and you can follow the development progress on GitHub. Check out the session recording here.
WEBBOTAUTH (Emerging Working Group)
Though not yet officially chartered, the proposed WEBBOTAUTH group is (we think) one to watch. The aim of the group is to have “discussion of use cases, requirements, and proposed solutions for authenticating non-human users (‘bots’) to Web sites intended for humans (‘normal’ Web sites).”
So far, Thibault Meunier from Cloudflare has submitted some interesting initial drafts around web bot and agent identification, and he also gave a presentation on "Cloudflare Use Cases" in the session. Watch the session video here.

MAPRG (Measurement and Analysis for Protocols)
The MAPRG session included a standout presentation by Mostafa Ansar, PhD Candidate from Radboud University, on crawler refusals, the paper for which we have featured in our Featured Papers page on this website. There was also a presentation from Elisa Luo, PhD student at UC San Diego, titled "Somesite I Used To Crawl", which featured Common Crawl data, and more from Cloudflare’s Thibault Meunier in his presentation "AI Crawlers Insights" which features some very interesting analysis on which UserAgent
strings are commonly found in robots.txt. We’re pleased to see discussions on the subject, and are looking forward to some good discussions coming from this Working Group. Watch the session for more.
Other Discussions and Future Work
It was great to catch up with our existing friends and make new ones at Meta, Ericsson, Google, Anthropic, and more. There were of course many other discussions and connections, but they are far too numerous to list here.
The next IETF meeting, IETF 124, will take place in Montréal, Canada from the 1st to the 7th of November 2025. Looking forward to more discussions there.

Erratum:
Content is truncated
Some archived content is truncated due to fetch size limits imposed during crawling. This is necessary to handle infinite or exceptionally large data streams (e.g., radio streams). Prior to March 2025 (CC-MAIN-2025-13), the truncation threshold was 1 MiB. From the March 2025 crawl onwards, this limit has been increased to 5 MiB.
For more details, see our truncation analysis notebook.