Sitemaps and Internal Links – the 10000 Page Test

SEO PowerSuite

SEO category is sponsored by SEO PowerSuite. Power-charge your SEO with the industry's finest SEO tools. Rankings, backlinks, competitors, reports, analytics - you name it - all in one place. Try it for free now!

Just because someone with a huge following claims this or that, isn’t good enough for me when it comes to knowing whether to implement some SEO strategies or not. I’m not a sheep to slaughter. No matter how many other sheep are around me, nor how “trusted” the sheep-herder. I want to find out for myself. So I did. In a BIG way!

To Sitemap Or Not?

One thing that some SEO people say is “You don’t need a sitemap.xml file.” Eventually the GoogleBot will find your pages and index them. According to this theory, sitemaps belong in the “boondoggle” category. But do they really? What about a site that has poor internal linking? How well do sitemaps do at overcoming that issue?

What About Internal Links?

Can a site with tens of thousands of pages be properly indexed without full HTML navigation? And when it comes to linking, is having a “funnel” really as important as has been billed in our community? Or do your really important links have to be at the top of pages? Or can they be in the footer?

The Client and the Scope

Late last year, one of my web development clients, WebSight Design, got a gig building out a directory of storage facilities across the United States. The site is StorageSeeker.com. As a directory with listings in tens of thousands of locations across the country, this particular site would need to compete on so many levels it could make your head spin.

National, regional, and multiple types of local search aspects all come into play. Because this site is an aggregate data site, and does not itself have local addresses in every location, any typical local search optimization methods would not be able to be fully implemented here. Some other method would be called for at the local level.

From the Top – Competitive Analysis

When I got this assignment, one of my first tasks was to do a detailed review of the competition. And let me tell you – in the self storage market, the competition is ugly. There’s two very important factors – site depth and back links.

From the chart above, you can see what the competition is like in this field. The very high page count is due to the fact that every storage facility has it’s own page, or at the very least, every city in a database has it’s own page. And for the competitors with many less pages, they’ve spent a lot of time getting links back. (The really high link counts usually come from a partnership with a complimentary site that also has city based listings.)

Bigger Competitive Base

Some of the competition in this market is bleed-through, because some sites that come up are actually large moving companies, offering both moving and storage solutions. That’s a funny thing about the nature of our industry. Explaining why a client’s competition is more often bigger than they think.

You’re Also Competiting Against Google

Where it gets even more challenging is when you do a search and include a city name. Not only do you get all of the usual suspects in the competitive arena, you’re also now competing directly with Google’s Local listings.

Now, I understand why Google would want to present these results to someone doing a search. But it means there’s a much bigger problem for some companies trying to compete for the first page of Google results.

To business directory owners in markets like the storage comparison service, Google is now a direct competitor. I’ll leave it to you to decide whether that fits with the whole “Google is Evil” mantra or not. Instead, it just becomes one more factor for me to work into my analysis and action plan.

I don’t LIKE the fact that I have to work it into my analysis, but heck that’s what I get paid for. And as much as it pained me when I first realized this was an issue, I thought it would be exponentially more painful to the client. And it was. He was shell-shocked, because he never previously thought he’d be going up against Google, of all entities…

He’s getting into an arena that is about as entrenched as it gets. So even if he could match his top competitors page for page, he’d be in for a heck of a ride given the volume of back-links. And the potential for his site’s pages to be viewed as duplicate content because all those other sites have been around a lot longer…

But Wait – There’s More!

Not only are you competing against Google at the Local level, for some phrases, you’re even competing against Google on the national level!

That’s right – thanks to the fact that Google wants to provide what THEY consider the most relevant content to a search, for some keywords that you enter, even when you DON’T enter a geographic location, they ASSUME you may want to see that information.

In the screen shot, notice how there are three organic listings above Google’s Local results. Yet that means you HAVE to be in the top three now, to be sure you’re both seen above the fold, and before mass confusion sets in for the person doing the search when they get that map and all those local links thrown into that eye space.

I’m sure many of you reading this article already knew this stuff. I’m just mentioning it here though because some readers may not have considered the implication or ramifications of this before.

And when as it relates to my client’s site and what has to be done to overcome such insanity it’s completely relevant.

As you can see in the above screen-shot, my client is already in the top three for this particular search.

They’re not in such a stellar position for every non-city related phrase yet. But they are for a few. And they’re in the top five results for a few phrases targeting their prospective customers, the facility owners.

At the City Level

At the City level, for “find storage facilities Tucson”, they’re the first organic result that shows up just below the map and individual facilities that come from Google Local. That’s crucial, because if I were doing a search, and saw all those choices, I would surely get really annoyed really fast at having to click on every one of them to do a comparison on my own. Having an alternative to compare results would be really helpful.

A Word About Best Practices Page Titles

That’s magic when it comes to helping an end user who wants a quick way to compare all those results. Most of the time, we shoot for having an exact match in a page title so the person using Google sees the same phrase repeated, and bold.

In this case, it’s actually better having what I consider a more appropriate title given the need to differentiate from the Google Local listings confusion.

So what’s the key to my initial modest success?

Back Links – It’s Back Links I Tell You!

NOT!

It’s actually not back -links. The site currently has about 200 links coming back, mostly low level stuff so far. Compared to the tens of thousands of back links the competition has, that’s peanuts.

No, back links are not what’s done it. Sure, the next level of success will require acquiring more back links – but that’s chicken feed compared to what’s been accomplished already without giving a second’s thought to obtaining more links. And I’ve saved all that time and effort in quality back-link hunting…

The All Powerful Googlebot

Google claims it can navigate a site through forms on the site. This includes search forms. The G’Bot takes clues from the site’s purpose for use in text fields like you’d have in a search form. This then would give you results based on that search. Some sites don’t play that game though. Like my client’s site. For the search form to work, you have to type in a city or zip code, which the G’Bot is probably not intelligent enough to figure. Because without a sitemap or internal linking, Google never indexed a single city or zip code landing page. None. Not one.

Sitemaps – Why Bother?

Personally I have no desire to “wait around” for client sites to be found, “eventually”.

Do they work every single time at getting a site indexed faster? – well no. Some sites just have other issues to be addressed. So in that regard, I agree that it’s important to address those issues.

But for the most part, unless I’m doing a specific test, if my job is to get the client better results, then there’s no excuse.

And since most clients expect quality results, they have no problem paying the tiny little slice of the complete SEO budget for a sitemap to be created. And sitemap files, as far as I’m concerned, most definitely offer value, as you’re about to see…

Testing – Testing – 123 – Is This Mike on?

Just to really validate my experience though, and to see how effective a sitemap.xml file can be in the extreme, none of this site’s pages were linked from anywhere on the site initially. Other than the top level navigation (home, about, sign up, search, and disclaimer pages), there were ZERO internal links to pages that matter the most! So it was an ideal test as to whether a sitemap will really have an impact.

So in my spec, I had the development team set up a system so that the sitemap.xml files would be regenerated once a night – in case any new cities or zip codes are added by the site owner through the content management system. (Eventually we even took that further – including a function for the site owner to click a button and have the sitemap.xml files regenerated instantly).

As soon as the new system was in place and live, I manually submitted the files to Google. This site actually has three files because we want speedy review.

Within twenty four (24) hours, the site had over 400 pages indexed. Not stellar, considering there’s tens of thousands of pages.

By the end of that week, there were over 1200 pages indexed at Google. Nothing else had been done to the site in the mean time. A week later, it was down to about 900 pages. Then it jumped again to 1,000. Apparently, Google’s system’s first reaction is to trust at least some of a sitemap.xml file.

At Yahoo, we nailed 4,000 pages indexed, then 12,000. All by submitting a sitemap.

Even when there were NO internal links.

Next up – The Funnel Effect

Okay – so clearly, just by generating quality sitemap.xml files and submitting them to Google, you can get lots more pages indexed. But let’s be real. 1200 pages does not a mega-site make! At this point, the single most obvious next step is an internal site linking structure. And rolling out on-site links would help validate my belief that Google wants to have that second opinion before it indexes more pages it finds in the sitemap.

Well heck. Here we were, needing to get links to all of those pages on the site onto an actual HTML readable page or pages. On the one hand, I could have simply had an engineer create one massive page with links to all the others, in an old school “Site Map” page.

From a user experience, I know intuitively that would be a tragedy. Can you imagine finding that page? How long would it take to load? How pretty would it look? And of course, there’s the SEO industry argument about how many links you can get away with on a single page…

Okay – shift into usability mode…

Drill Baby Drill

One method that a lot of sites employ for drill-down is through drop-down menus. That was a consideration, because I wanted this funnel to begin high up on the site, remembering those who say how vital it is to have the most important links at the top of the page rather than in the footer.

Except it would have required a site redesign. And I am open to experimenting on this one. So the obvious choice for me was to create a STATES landing page. That one page is linked from the bottom of every page on the site.

READ THAT AGAIN IF YOU HAVE TO

It’s NOT a primary site link in the top navigation bar. It’s in the Footer. And there is not a link to every State landing page from every other page on the site. It’s a controlled funnel.

With a list of all 50 U.S. States on that page, in turn, there are links to State level landing pages. Each State’s landing page then links to a landing page for each of the cities in that State. And now we had a funnel drilling down to the City level – the same level as all the City pages included in the new sitemap.xml files. Note we did not initially do the same for zip codes. I needed to see how effective this one method would be before moving on.

Internal Links Really Do Matter – Duh

All of that work went live about two months ago. And within a week of launching that funnel, we watched the pages in Google’s results jump to about 4,000. Then we hit 9,000. We’re presently bouncing between 8,000 and 9,500 pages. And when I check the Google Webmaster Tools system, it shows between the three sitemap files, a total of 10,985 indexed URLs.

As far as the higher number of indexed pages compared to the site: results? Duplicate content. Many of the city pages have such little unique content that they’re probably flagged as duplicates. And (I’m guessing here) the Google indexing process for evaluating duplicate content is probably day to day, or maybe even real-time. Anyone who knows is encouraged to comment here and offer further insight of course.

Yahoo – a Funny Concept

So we’ve got over 10,000 pages indexed at Google. That’s good. But at Yahoo? A week ago, it was 68,000 pages, and today it was over 85,000 pages!

WAIT.

How can there be such a massive discrepancy?

Yahoo is flaky?

Maybe.

But also, So Google is essentially ignoring them because the only links to those pages are at the city level, most of them are very similar (duplicate content), and they’re not in the sitemap.xml files. Yet.

BA-DA-BING, BA-DA-BUST

We haven’t submitted the sitemap files to Bing yet. So not surprisingly, they’ve only indexed just over 100 pages.

Huh?

If we haven’t submitted the sitemap files, but we have those internal links, why wouldn’t Bing have indexed at least some of them?

Well they have. Without those links, there’s only about 15 pages on the site. So clearly Bing has found SOME of them worthy of indexing. I just haven’t done any testing as to why or what the cause is for that mess. Until now. But sometime in the next year I’m going to have to begrudgingly allocate some of my hours to getting Bing going, given the recent joint venture between Microsoft and Yahoo. And maybe all it will take is submitting the sitemap.xml file!

Why This All Matters

Okay, so we’ve seen a skyrocketing of pages indexed, and we’re showing up in the SERPs. So what?

Of course, there was almost no search traffic before I started opening the floodgate. But a 500% increase is pretty good. At least when it comes to showing how effective sitemaps, and internal links can be.

Oncreased Traffic – the Double Dip Method

With all this work, we immediately saw smaller, and medium size city landing pages jump to the first page of google. But the bigger cities were stuck on the 2nd page of Google. That’s when I pulled out the big guns.

I tasked the creation of a new CMS component that allows the site owner to create unique City centric pages filled with well optimized content (bolding, bullet points, photos, charts, links to individual facilities within that city, outbound links…). And as soon as he submits a new City centric page to go live on the front end, that page is built on the fly, AND includes all of the facility listings for that city page.

So we’ve essentially created a mechanism that ensures the really big and important cities have double coverage. Both from the standard City drill-down method and from the CMS created, detailed, content rich method. So far there’s 20 such pages. They get built at the client’s pace.

And they’re linked from the site’s footer to the “Featured Cities” landing page.

Within days of them being posted to Google through the sitemap.xml file, more than half of those pages started coming up on the first page of the SERPs, above the fold.

And as the Top Visited Pages report shows above, four of the top ten pages viewed were those new featured city pages.

(The “example” page referenced in that report is a page that facility owners can go to where they see what the site has to offer them. And YES, you see both “/” AND “/index” – so there are still duplicate content issues I’m working on getting resolved by the development team!)

Nest Steps

An obvious next step will be to expand the funnel to now include links to Zip code landing pages.

We’re also working on inbound links.

Top Secret – Stay out

Should any of you get the idea that you’re going to run out to Storage Seeker’s competition and sell them on my methodology, be afraid. Very afraid…

Beyond the obvious, I have about a half dozen additional techniques I haven’t even used yet on this site. And that’s just the low hanging fruit. And since I am writing this article now, before those have been implemented, you can forget about me revealing them here.

Uh, if you’re gullible to believe that any method or technique I have up my sleeve is some super double top secret SEO trick that only I know about, I’d be happy to sell you this shiny ring with a genuine one of a kind gemstone that came from, uh, would you believe Mars?

So there you have it. Proof positive? Maybe not. Good enough for me? You betcha! Pardon me while I go find a sheep to shear…

Alan Bleiweiss has been an Internet professional since 1995. Just a few of his earliest clients included PCH.com, WeightWatchers.com and Starkist.com. Follow him on Twitter @AlanBleiweiss , read his blog at Search Marketing Wisdom.

Alan Bleiweiss is a Forensic SEO audit consultant with audit client sites consisting of upwards of 50 million pages and tens of millions of visitors a month. A noted industry speaker, author and blogger, his posts are quite often as much controversial as they are thought provoking.

20 thoughts on “Sitemaps and Internal Links – the 10000 Page Test”

Minor edit: in the article, in regard to individual facility pages, I state“the sitemap.xml files don’t include any links to any of the individual facility pages. Yet. So Google is essentially ignoring them, even though they are in the sitemap.xml files, because there are no internal site links to support them, while Yahoo thinks they’re perfectly valid pages that deserve indexing.”

The first part of that should read:“So Google is essentially ignoring them because the only links to those pages are at the city level, most of them are very similar (duplicate content), and they’re not in the sitemap.xml files. Yet.”

(Confused statements happen when I make changes to my massive articles at 3AM. )

Great article, very interesting read…what’s the reason for not including individual facility links in the xml map right away? I’m doing something very similar but the site is much larger than any I’ve ever worked on so it’s presenting me with some challenges.

And about yahoo showing that many pages indexed. Y! shows yellow pages has 47 million pages indexed while site: on G shows 680 K – but they only have 137 xml sitemaps (which cover < 137*50K= 6.85 million links). I don’t know what the actual number of pages is for YP, but 6.85 mil does sound low, and 47 mil indexed sounds v high.

This site is a step by step see how it goes project. Very rare to have such a great opportunity. It’s the only way to see how effective each step is.

The whole Yahoo vs. Google showing X thing is a bit muddy. On the one hand, there could be X URLs in sitemap files. Google will then index all or some of them. They will then display all or some of THOSE in the results on any given day. It does go up and down when I check.

Yahoo seems to give more value to links on pages regardless of a duplicate content issue.

Then again, Many times when I am reviewing client or competitor sites in the Yahoo link: view, I see pages that link back ONLY because of AdSense ads. Isn’t that sweet?

Honestly though, I don’t know enough about the exact cause of how exactly either handles indexing vs displaying in results vs link factoring to speak more accurately on the subject, and thus I invite SEJ readers who have done granular testing on those topics to chime in…

Very interesting post. I do wonder, however, how your test would have turned out if you hadn’t used a sitemap at all first – but had simply created the internal links first. If you’d done the state / city pages first, with the one sitewide footer link to the states page, without submitting a sitemap at all, would the results have basically been the same?

Personally, I like sitemaps. It gives me a warm, fuzzy feeling to have a feeling of some modicum of control over what gets indexed, even if it’s a bogus feeling. But I just wonder if the test would have shown the same results even without one.

Great article and good to see straight “best practice” SEO working out :). One thing, you mention that you haven’t submitted the sitemap to Bing, why not use robots.txt to link to a sitemap of the various sitemaps, effectively auto-submitting everything?

That’s a very good question. One reality is that it’s impossible to test on the exact same site, so I can only talk about other sites and how not having sitemap.xml files while still doing the funnel DOES work.

As relates to this article, I needed to test in a big way whether a sitemap.xml file can get at least some pages indexed, especially for those sites that we don’t have the luxury of doing things like controlling the HTML based links in that way. (Sites that use AJAX, JavaScript, or off-page CSS for links can cause the search engines serious problems).

@millerian, one of the sitemap files is referenced in the robots.txt file – I need to update that to include an index of sitemaps actually – (We’re rolling out a few others as well)…

@db – I intentionally did not add the link to the site because I felt a direct link, even with a nofollow, would possibly imply ulterior motives on why I wrote the article.

This was outstanding. I’ve taken the easy way out with clients when talking about sitemaps. I’ll say “The bigger the site the more a sitemap matters”. But I’m seeing that even small sites, especially if they have poor nav links, can benefit as well. thank you for giving me a new easy way out, as this article is now required reading for my clients

I’m glad you find the article so helpful. I’m not so sure a non-SEO type would be able to undestand some of it. One of my clients read it and said “I understand about as much of this as you probably do about my stitching (She’s a professional quilter). What’s SEO? What’s backlinks?…”

Thanks for the research and posting. It was especially helpful to hear about loading multiple sitemaps, as this is more rare for the sites I’ve worked with. For Yahoo, the engine seems to have a higher threshold for duplicated content on pages and also has trouble accounting for 301s. Some of this was fixed when we all saw some big drops in Yahoo backlinks a few months back, but it still reports some inflated numbers.

This was the first time I’ve needed to use multiple sitemaps. I’ve since needed to do so on other client projects as well. I can’t always control the stability of a client web server, so this mitigates Google’s trying to get to them. So I stay ridiculously under the 50,000 URL limit.

While its not the post i was originally looking for, it was worth the read as you have highlighted several common issues and how to work through them with a focus around increased conversions/ctr not just rankings.