Over the past year I have been travelling and speaking to SEOs at conferences around the world, and the same question keeps coming up: why is a page that ranks well in Google still invisible to ChatGPT, Gemini, Claude, and Perplexity? I wrote this guide to answer it.
Today Common Crawl is publishing The AI Visibility Audit, a free field guide built for the SEOs and GEOs who are already doing this work and want a concrete framework rather than theory. It explains how AI systems actually discover content, why training-data inclusion behaves like a ranking factor, and how to run a repeatable, five-check audit using only free tools in about 90 minutes.
The reason a high-ranking page can go missing sits one layer upstream of everything we as SEOs usually audit. Before on-page work, before technical SEO, before link building, a site has to be reachable by the crawlers that feed AI training data. If it is not, the model never learns it exists.
The guide walks through how CCBot crawls the open web and publishes the archive that helps train modern LLMs, how harmonic centrality in the Common Crawl Web Graph sets crawl priority, why CDN and WAF defaults now silently block AI crawlers and training data crawlers, and why AI still leans toward English, with the English share of the latest crawl at roughly 41 percent.
The five checks move from the most decisive to the most strategic, and the results package into a one-page scorecard most agencies do not yet offer.
The old world was index and rank. The new world is train and retrieve. If you are not in the crawl, you are not in the model.
Read the guide, run the checks, and open the door.


