Yesterday on the train, Brian R. Brown and I were chatting about orphaned pages, XML sitemaps and indexation without benefits. Brian referred to XML sitemaps as the “one hit wonder of SEO.” Brilliant! XML sitemaps, like Dexy’s Midnight Runners, are one hit wonders.

Dexy’s Midnight Runners, for those of you who missed the 80s, are famous for their one hit “Come on, Eileen.” XML sitemaps are famous for inviting the crawl. And just like Dexy’s Midnight Runners don’t have any other great songs, XML sitemaps really don’t provide anything other than a way to request that search engine spiders crawl your site. This comparison just begs for a Weird Al-style lyrics mod:

Come on Crawl Me,
I swear (well he means)
At this sitemap
You’ll find everything…

Actually Blondie’s “Call Me” was screaming for a “Crawl Me” spoof, but you can hardly call Blondie a one-hit wonder. Anyway, back to XML sitemaps.

For want of the crawl indexation was lost.
For want of indexation rankings were lost.
For want of rankings the visitors were lost.
For want of visitors the site was lost.
And all for the want of a crawl.

I’m taking a few liberties, but the premise is the same. No crawl, no organic search visitors. End of story. In this regard, XML sitemaps play a role in the initial discovery of your URLs.

The XML sitemap rolls out the red carpet and invites search engines to crawl and index the URLs you’ve so thoughtfully included. This, in turn, can increase indexation for large, complex sites that contain of thousands of pages. On such sites it could take even a committed bot (like Googlebot) many visits to crawl the whole site, especially if it keeps encountering duplicate content. Less thorough bots (I’m looking at you Bingbot) might take even longer to discover new content. A conscientiously updated and autodiscoverable XML sitemap helps bots find new URLs, which should speed time to indexation and rankings if the content is valuable.

4 Comments on XML Sitemaps: Dexy’s Midnight Runners of SEO

Great article and perspective, as always. Question for you on XML site maps, redesigns, and Google Webmaster Tools. We recently redesigned our website. Traffic dipped for a while afterwards, and we realized we hadn’t deleted our former site’s XML site map from Webmaster Tools. So Google was looking for URLs in its index from the old site, that we hadn’t setup 301 redirects for. We have since deleted the old site map and Google has approved the new one. We have 301s to our cat and sub cat pages; and products that ranked on page one in Google. With this in mind, why is Google still giving us 404 errors for URLs that aren’t in the new XML site map, and aren’t our top pages?

Hi Debra, Ah redesigning = fun. But not really. Especially trying to get all those pesky 301s in place. In general, just because a URL has been removed from the XML sitemap doesn’t mean it will be deindexed. If the dead pages are returning a hard 404 error the bots should be pretty quick to drop them out of the index but it will take some amount of time for them to recrawl and encounter the 404 a couple of times. So if these dead pages were getting crawled once a month for example, it could be a couple of months before the necessary recrawls happen on their own. You can go to Google Webmaster Tools and request deindexation of individual URLs if you’re concerned about users finding them in search results. Though if users are finding them in search results they should be 301d instead anyway.

Without knowing which URLs we’re talking about, I’m guessing it’s ones like this? http://www. good-doggie .com /servlet/the-1062/HipAction–dsh–Zukes-Peanut/Detail (URL dismembered to prevent linking to a dead URL, thereby slightly exacerbating the problem.) They do return a hard 404 now, but I had to dig pretty deep to find one. This page was last cached on May 19, so it may be a good long while before it gets deindexed. If it doesn’t get any organic traffic, then it might not matter materially if it gets deindexed tomorrow or 3 months from now. The 404 will eventually do the trick. If it is getting organic traffic, 301 it. And if it’s not getting any organic traffic but it BUGS you knowing it’s out there, request deindexation in the Webmaster Tools. Let me know if that didn’t answer the question.

Thanks for the long reply and detailed recommendations. Yeah, it bugs me to know there are URLs like this out there. Moreso – in Google Webmaster Central it makes our site look like we have a lot of errors. But it sounds like that is a temporary issue as long as those pages weren’t getting a lot of organic traffic. One thing that we’ve noticed is that Googlebot has crawled URLs like this that look like they were indexed from our search tool, like this one: http: // www. good-doggie. com/ servlet/ Detail?category=ALL&no=845&searchpath=40073. Needless to say they were never in our old site map. Not sure why they are indexed now. If these are of no concern either, great. I’ve sped up the crawl rate, in hopes of
those URLs more quickly getting removed from the index.