The Moz Blog

How I Escaped Google's Supplemental Hell

This post was originally in YouMoz, and was promoted to the main blog because it provides great value and interest to our community.The author’s views are entirely his or her own and may not
reflect the views of Moz.

I veered left, but it was too late. A wall of fire sprang up in front of me, blocking my path. I turned around and there he was: Googlebot, my old nemesis and lord of the search underworld. Even as the flames engulfed me, damning me to supplemental hell, his cold, metallic laugh froze me to my very soul…

Ok, maybe I'm exaggerating just a little. It would probably be more accurate to compare Google's supplemental index to a virtual Purgatory, a place where morally ambiguous pages go to wander for eternity. I've personally been stuck in this purgatory for well over a year with an e-commerce client. They have a data-driven site that's been around since about 1999, and, admittedly, we've only recently started paying close attention to SEO best practices. Roughly six months ago, I realized that, of 32,000 pages we had in Google's index, all but 7 were in supplemental. Now, before I offend Matt and Vanessa, I should add that I do believe them when they say that the supplemental index isn't a "penalty box." Unfortunately, when 99.98% of your pages are stuck in supplemental, there are, in my experience, very real consequences for your Google rankings.

Yesterday, I logged into Google Webmaster tools and finally saw the magic words: "Results 1-10 of about 24,700." Translation: our content has finally made it to the show. So, in celebration, I’d like to share what I learned during this Dante-esque, six-month journey through hell/purgatory. First, a few details: the site is essentially a search engine of training events, powered by ColdFusion/SQL. Many of our problems were architectural; it's a good site with solid content, we've never used black-hat tactics, and I don't think Google was penalizing us in any way. We simply made a lot of small mistakes that created a very spider-unfriendly environment. What follows is a laundry list of just about everything I tried. This is not a list of suggestions; I'll try to explain what I think worked and didn't, but I thought walking through the whole process might be informative:

Created XML sitemap. Fresh from SES Chicago, I excitedly put a sampling of our main pages into a sitemaps.org style XML file. It didn't hurt, but the impact was negligible.

Added Custom Page TITLEs. By far, our biggest problem was use of a universal header/footer across the site, including META tags. Realizing the error of our ways, I started creating unique TITLE tags for the main search results and event details pages.

Added Custom META descriptions. When custom titles didn't do the trick, I started populating custom META description tags, starting with database-driven pages. It took about 1-2 months to roll out custom tags for the majority of the site.

Fixed 404 Headers. Another technological problem: our 404s were redirecting in such a way that Google saw them as legitimate pages (200s). I fixed this problem, which started culling bad pages from the index. The culling became noticeable within about two weeks. This was the first change with an impact I could directly verify.

Created Data-not-found 404s. Although this is somewhat unique to our site, we have an error page for events that have passed or no longer exist. This is useless to have in the index, so I modified it to return a 404. The user experience was still unique (they got a specialized error and search options), but the spiders were able to disregard the page.

Re-created sitemap.xml. Reading about Google’s crackdown on search results that return search results, I rebuilt our sitemap file to contain direct links to all of our event brochures (the real "meat" of the site).

Added robots.txt. Yes, I didn’t have one before this, because, frankly, I didn't think I should be blocking anything. Unfortunately, due to the highly dynamic nature of the site, the index was carrying as many as 10 duplicates of some pages (e.g. same page, slightly different URL). I started by purging printable versions of pages (links with "?print=1", for example) and moved out from there. Results were noticeable within two weeks, much like the 404s.

Added NOODP, NOYDIR tags. This helped with our outdated description on Yahoo, but had no effect on Google, which wasn't using our Open Directory information anyway.

Created Shorter, Friendlier URLs . This was a biggie. Being a dynamic, ColdFusion site, we were using way too many URL parameters (e.g. "/event.cfm?search=seomoz&awesomeness=1000&whitehat=on"). I was avoiding the re-engineering, but decided to simplify the most important pages, the event brochures, to a format that looked like "/event/seomoz".

Revealed More Data to Spiders. One of my concerns was that the spiders were only seeing search results 10 at a time, and wouldn't visit very many "Next" links before giving up. I added specialized code to detect spiders and show them results in batches of 100+.

Changed Home-page Title. Going through the index, it occurred to me that just about every major page started with the same word and then a preposition (e.g. "Events on", "Events by", etc.). I decided to flip some of the word-order on the home-page TITLE tag, just to shake things up.

Sorry, I realize this is getting a bit lengthy, but I felt there was some value in laying out the whole process. Steps 9-11 all happened soon before we escaped supplemental, so it's a bit hard to separate the impact, but it's my belief that #9 made a big difference. I also think that the culling of the bad data (both by #5 and #7) had a major effect. Ideally, instead of 32,000 indexed pages, our site would have something like 2,500. It sounds odd to be actively removing pages from the index, but giving Google better quality results and aggressively removing duplicates was, in my opinion, a large part of our success. We're down to about 24,000 pages in the index, and I plan to keep trimming.

Of course, the effects of escaping supplemental on our search rankings remain to be seen, but I'm optimistic. Ultimately, I think this process took so long (and was so monumentally frustrating) because I was undoing the damage we had done slowly in our inept spider diplomacy over the past 3-5 years. Now that we've dug out, I think we'll actually get ahead of the game, making our search results better for Google, end-users, and our bottom line. I hope this is informative and would love to hear from others who have gone through the same struggle.

Created Data-not-found 404s. Although this is somewhat unique to our site, we have an error page for events that have passed or no longer exist. This is useless to have in the index, so I modified it to return a 404. The user experience was still unique (they got a specialized error and search options), but the spiders were able to disregard the page.

When I worked for a publisher we had lots of content of this sort (including expired events); we opted for a 410, so that the engines don't even attempt to come back for it, and don't think that we have thousands of error pages....

One of the best YOUmoz submissions we've received - excellent work, Pete. I actually really like your strategies as well; I think that everyone should be paying careful attention to your advice - it's the same strategies we've used to escape that final layer of BVFARSRH... Poor Kelly, she'll never get out.

A white-hat method to avoid 'adding specialised code' - *snigger* - is to categorise your posts/pages and create a list of categories accessible for users and search engines from the top-level pages as an alternative navigation!

Actually, we do have category-based navigation as well, but the site lists over 2,000 events with over 30,000 active variations (cities, dates, etc.). A hierarchical structure that accomodated all or even most of those would be untenable. Users reach most of the events through a rich search function, which might as well be lead-shielded to Googlebot.

This code was actually just for the search results pages after the category-based navigation. Users have the option of viewing 10, 25, or 50 results at a time (default is 10). I'm just auto-defaulting Googlebot to 100. Yes, it is a modification just for spiders, but all of the content I'm feeding them is theoretically available to end-users, and the core path is the same.

You're fine with this particular method. Google even mentions that they like it when you make things easier for the search engines, and they have said that it's ok to do certain server-side operations (distinct from cloaking) that remove excessive query strings, etc. The key words are "intent to deceive". You're in no way intending to deceive Google so this wouldn't classify as cloaking.

You know, I honestly hadn't even thought of this as being a cloaking-related practice; I was really just looking for a way to save the spiders the extra time of cycling through links, as, unlike users, they don't necessarily mind a slightly longer page. I see your point, though.

There's an argument to be made that cloaking, while it has a terrible reputation and brings to mind blackhat porn spam, is very subjective. If you're "cloaking" in a way that's actually going to improve someone's search results and experience on the site, then is it wrong?

There's a big difference between what you've done and someone presenting a page about Sony headphones to an engine and then showing a user viagra and knock-off Gucci purses. People yell "CLOAKING!" and then run off as though we were in grade school and they were yelliing "you smell!

I've actually found the discussion on the gray areas of cloaking really interesting, but it is funny that people focused in on what was ultimately a tiny part of what I did. I don't think it really even had any impact, frankly (although it may down the road).

It sounds as though you are also making the site in question a lot more usable, which everyone likes, not just Google.

Supplemental is one of the most frustrating things about SEO. I had a project once where everything seemed to fall into supplemental. Nothing seemed to be working (I did *everything*) until one day, our content was back in the regular index... and on the front page for our primary keywords, at that! This confused the hell out of me, but I put it down to hard work eventually paying off. It just sometimes takes longer than we'd like.

You know, I completely agree. In hindsight, all of the advice was out there; the trick was how to apply it, since, once you've gotten yourself in trouble, you so often don't know precisely why. Once I started really digging into the supplemental results and seeing what Google was seeing (repeated titles, massive content duplication, etc.), it all came together.

Of course, the other frustration is that, every time you try something, you have to sit on your hands and wait a week to see if it mattered. Ultimately, that's why this took so long. Still, it's like hitting yourself on the head with a hammer: it feels great when you stop :)

Pete this post is a great reminder and it's also a good reason why it's important to have a web designer/develop with an understanding of seo. Yours is far from the only site that could benefit from the advice.

I think that was really the lesson for me. Being a newbie, SEO-wise, I kept thinking there was some trick that, if I could just sort it out, would solve the problem. Ultimately, most of what I ended up doing was good for the site and its SEO in general, not just for the supplemental-index problems.

After coming across your post I realized I was involved in the exact same circumstances you were. My main keyword was not showing in the search results and many of my other pages were displaying on the first pages of google.

Here are the steps I took.

1) stripped out all of my duplicate meta descriptions from duplicate pages

2)modified my xml site map to accurately show my most important pages

3) added the robot txt blocking the google bot from accessing my database search strings as there are an infinite number of them. my webmaster tools section is shows over 4000 pages blocked

It took less than a week and now I'm on page 1 of google for my main search term

conclusions:

google has an extremely aggressive duplicate filter, always create 1 page for your most important terms, if you create more than 1 page google will most likely catch it and its possible you can lose both of those pages. Google will also catch title tags which are similar but not exact. Kill those similar pages too.

Use your robots txt files to block any pages in a database where varying results can be produced based on entered criteria. I may block files which I do not really care appear in google. Less seems to be more.

Matt Cutt's wrote a rebuttal to a Forbes article about someone who was in supplemental

Andy Greenberg wrote an article for Forbes entitled “Condemned To Google Hell” about supplemental results. I was getting ready to go on vacation, so I didn’t have a chance to talk to Andy, and now I wish that I had. It’s easy to read the article and come away with the impression that Google’s supplemental results are some sort of search engine dungeon where bad pages go and sit in limbo forever, and that’s just not true.

Andy actually called me about this post, and I honestly got the feeling that he was pushing a bit too hard to find the horror stories. I think Supplemental really can feel like Hell to some people, and getting out can be very frustrating, but I don't believe that it's intended to be a penalty. For my own client, while it certainly frustrated our SEM efforts, I can't claim that they lost thousands of dollars; I'll probably never know for sure.

Hmm, I remember this article on Forbes... Maybe its just the odd me, but if my site ranks on top #1 for certain keywords and I can make business worth 3million dollars, the day I find my site in supplemental results, with no traffic/visitors/clients, therefore no revenue – I have a certain doubt that i will find being at supplemental a very good thing.

G. are taking themselves a little too serious, its not a new thing - and that is not really good. Google needs to be reminded that they “actually” didn’t invent the internet and that they don’t own it (at least, not yet).What G. does, is to make use of the internet (and yes, our websites) to do its own business, a big business. G. makes zillions of dollars on advertising in net of website resources, full stop.

We are talking about business; we are not talking about the new w3c (about google demands on affiliate link codes)

Another thing, Google apparently has many employees that like to do free PR for their employer. Good to hear that people are excited about who they work for – but unfortunately, not everyone is a born PR man. It’s a bore to see & read this never ending net of contradictions, patches and most of all, opinions.

It just makes worse what was never good. G. has never been excellent publishing clear guidelines, explaining the way they do things… or I should say, the way they “want” things done (by the webmasters). There are a lot of documents out there, but they forgot to include a little meat in it.

Its about time that yahoo throws in a simple & clean search page ( & get rid of all the clutter they have today). I have the feeling they would attract get a lot more search queries than what they do today.

Give G. a cut on their air time – G. will stop being evil in no time ;)

While I honestly believe that Google doesn't see supplemental as a penalty, I think they have to realize that, by developing such a complex, algorithmic system, it's bound to behave in ways that they didn't expect. Most of us did make some mistakes that got us into supplemental, but that doesn't mean that (1) our sites are inherently unfriendly or have bad content, or (2) we're not suffering financially for it. I think Google might be well served to acknowledge the gray areas a little bit more and help those of us who are well-intentioned build better sites for both our users and Google's.

Very true, sometimes due to ignorance, sometimes due to "all other sorts of evil things". Imho, its the stigma of the later that the ignorant/laid back (like myself) are concerned about when its pages suddenly fall into the supplemental area.

There is a reason to get there, oh yes - but what "the reason was" is perhaps something webmasters would find very useful to know more about. Its easier to act on something concrete than wander around and try to fix everything at the same time.

A Google Validator wouldn’t be a bad idea (!)- it could turn a very grey area into something crystal clear. A validator would let webmasters know if their sites complied with G. guidelines or not - and therefore be in or out of the index. Supplemental index could then well disappear.

Thanks for your post Pete. Very interesting and useful. I will try some of your suggestions.

I have found myself in a similar situation - supplemental results are such a headache. Our site has been re-designed too quickly and sometimes not it the best manner (redundant descriptions, lack of attention to titles, no validation etc...), so now im spending a little time getting whats wrong, right.

Sometimes it feels like walking the desert (sometimes i wonder if changes will actually have any effect)....but the results show a positive impact.

I found also very interesting the mini discussion about cloack links - maybe you can start a discussion in this particular topic? To cloack links can be a usability issue...

Great article. I am currently looking at creating a new site and SEO is at the top of our importance list.

Everything you mentioned makes great sense except that you skipped through the "url shortenning" part pretty fast. Exactly what method did you use to do this?

Our web server is IIS and the only way I can think to shorten the URL is to piont the 404 error to a handler page that then directs the user to the correct page or displayes the corrent data in the page. The downside I have been reading about was the possibility that it might send a 404 code to the search engines negating its indexability.

Unfortunately, it's a bit harder for IIS, but I use a plug-in called "ISAPI Rewrite" that mimics Apache's mod_rewrite. That works pretty well, and is free for one site, but I think there's a nominal fee ($50?) if you're running multiple sites on one server.

Using 404s to trick the headers isn't a good bet; even done right, the search spiders may see the resulting page as a 404 and stop indexing. Sometimes, you can manually rewrite a 301 (ColdFusion allows this), but it's tricky.

i'm blogging for a few months. since march. i hav PR4. i've burned my feed and everything. but... i haven't instaled the feedburner plugin in time. so... i had a crazy feed link which was indexed by google. (1/post).

so now, i have about 100 suplemental links for every post feedlink indexed, as i think google considers it duplicate content with the actual post's url.

what i've done:

- i've killed the crazy feed. so now i won't get anymore duplicate content indexed

- i've created a robots.txt for the indexed content. already visible in supplemental

what to do now?

- should i complete the robots.txt with the indexed content that i know it's duplicate but hasn't yet gone to supplemental?

- should i ask the url removal from webmasters? if so:

- should i ask the removal of the duplicate content that is has not yet gone supplemental?

- should i ask url removal for duplicate content that has gone suplemental?

- should i ask url removal for "original" content that has already gone supplemental?

- should i ask url removal of all my site? (after all... i have a PR4 and i have some good incoming link)

The humorous image made me laugh. The dramatic first paragraph made me laugh some more.. and then convinced me to read the rest of the article, which contains proven SEO advice. I might try a few of these tips and tricks myself. Thanks for sharing this info!

I'm afraid that's a bit tricky, and is something that will probably require a programmer. Every HTTP request carries something called the "user agent", which is sometimes the browser (IE, Firefox, etc.) but can also be the name of a spider/bot. Your pages would need to pull that information and look for "Googlebot" or, for Yahoo, "Slurp".

As I haven't been actively link-building, I didn't pay terribly close attention, but Google registered about 150 link-backs near the beginning of the process. We're a few shy of that now, as we probably purged some old/bad links. Yahoo shows about 8,000 link-backs, which has fluctuated but remained relatively unchanged during the process. Since Google only shows a sampling, I'm never sure how to interpret those numbers.

Here's where my newbieness (newbishness, newbism?) becomes plain: I didn't even realize that the "Links" tab showed different data than the "link:" command. So, first off, thanks for that. Unfortunately, being clueless, I can only say that we currently show 1,200 links. I have no idea what the count looked like when I started the process. Just under half our inbound links are to the home-page, though, which I imagine would be unaffected.

I've noticed a trend for newer sites that are stuck in "supplemental hell" where a few good links will pop the pages right out into the main index. Of course, these are domains with under 50 links pointing to them to begin with.

The items on this list represent common practices any site could use to improve results within search engines, not just as an escape from supplemental hell. I agree, the things you mentioned do help to get things more organized and accessible for bots, which may or may not cause less supplemental results, but for the most part they should be used on most all large scale sites (many pages, dynamically driven, categories and product pages, etc).

Glad to see you got your pages back though, nice work figuring it out.

I agree they are common best practices, but I think what Pete discovered is that not doing them can cause a lot of problems and in some cases lead to going supplemental.

Pete I think you showed too how a lot of seemingly little changes can have big results. My guess is you'll at the very least start to see an increase in long tail traffic and probably very quickly.

I had my own supplemental issue about 8 or 9 months ago. In my case it was due to Google indexing the feed of my blog. That was causing every page on my blog to be seen as duplicate content and all pages were in the supplemental index. A few simple changes to robots.txt took care of things and within a wee the pages were back in the main index.

If anyone is interested you can read about my experiences and solution here.

Congrats Pete on figuring things out and getting out of the supplemental index

I think you're exactly right; what I ultimately discovered was that this was a confluence of a lot of small things over a fairly long time. Catching up was the hard part, but I think staying ahead might be relatively easy now.

When I think about my own supplemental problems I liken it to having built a dam that was blocking potential search traffic. Making the fixes was like opening up the flood gates and letting all that traffic that wanted to find me make it though.

It can be all too easy to get lost in link building. Links are obviously important, but I think building a search friendly site isn't given enough credit. It doesn't necessarily bring more traffic, but it helps make sure you get all the potential traffic you can from the rest of the seo you do.

I'd be really interested in what you see happening in regards to traffic. I do think you'll be seeing a lot of long tail traffic in a relatively short time. A follow up to this post might make for another good read in a month or so when you have a handle on what the changes are bringing.

Nice work on the premonition. Just a week into escaping supplemental, and we're showing strong long-tail results that jump directly to events. In many cases, we're popping up ahead of competitors and taking people directly to what they want, as opposed to everyone else, who's hitting their home-page or search results.

Could you elaborate a little on your #4? Did fixing this issue involve making a change at the server level or did you need to do anything else beyond that?

While working on putting up a custom 404 page yesterday, I ran into this problem. The initial status code returned for the 404 page was 200 * sad face *. Reading this article helped me to identify the issue, and after changing the path for the 404 message in IIS settings, alls seems to be well. Sounds like I should read up on 410s though... Thanks for the great article!!

Happy to elaborate, although I'm not sure how helpful it will be if you're not using ColdFusion. CF has strange restrictions on its error handling pages, so I usually create a page that redirects to the page I want (via a CFLOCATION command). Unfortunately, CFLOCATION is actually a client-side redirect. So, the 404 page was actually redirecting to another page, which came back as successful/200.

Even if you're not using CF, any sort of redirection could be causing you the same problem. The other possibility, and this compounded my problem, is that you may be using a universal page header for your site, but then putting the 404 header information inside the actual page content. In other words, your universal header could be overriding your page header.

I will join the agreement with good post, which mirrors some of our own recent efforts at beefing up our non-supplemental numbers. As an aside if you want to check your header status but don't understand all this crazy number talk going on here there's a few free tools out there that can help you out:

Xenu's link checker is more about finding broken links, but it does have an option to report redirects as errors. This means you can check ALL of your various redirects and then export it to excel to make sure they're all working. Note: scientologists may not appreciate the philosophy behind the name of this tool.

I think I used a different one, but I also found the header checkers very useful. It's difficult to get a good sense of how your pages are being seen by the outside world, especially at the header level.

I think you've also shown that from a holistic view of SEO, that if you keep in mind factors such as relevance, ease of use and just trying to keep things as simple and clear as possible that you get rewarded.

Good post. The majority of sites I have worked on which have been stuck in Supplemental Hell were due to duplicate content, namely the page TITLEs and Meta Descriptions. More and more people are catching onto the need for unique page TITLEs, but I still see a lot of websites with duplicate Meta Descriptions.

There's a lot of discussion on whether or not Meta Descriptions are even a factor in SEO anymore. It's been my experience that they are still important. However, if every page has the same Meta Description, the site will be damned to Supplemental Hell and you might as well not use them at all.

Blogging software like WordPress usually automatically use the same Meta Descriptions for each page, especially with categories and tags. And sites with thousands of pages will take forever to create unique Descriptions for each page, so it would be better to leave them off and let the spiders decide.

In conclusion, utilize the power of Meta Descriptions, but only if they are unique. If they are not unique, don't use them at all.

Great discussion everyone. I am especially interested in your comments, seotrickster. As I was reading all of these comments, I checked out a site that I'm working on and noticed that we, indeed, have an inordinate number of pages in "supplemental hell." These pages, for the most part, are part of a series of pages that act as a glossary of terms which I thought would be helpful to visitors. However, it seems that Google finds them unimportant enough to be relegated to the supps.

The reason I am replying directly to your comment is because you specifically mentioned that sites were in supplemental hell due to "duplicate content, namely the page TITLEs and Meta Descriptions."

It would appear that this is not the problem in my case as I DO have unique TITLEs and Descriptions. I made it so that each TITLE and Description is based upon the content that is present on the page. Now, I will admit they are not the most exciting TITLEs and Descriptions; but unique, nonetheless.

The only thing I can think may be the problem is simply LACK of what Google considers to be USEFUL content. Any suggestions?

Over on 2Dolphins, I do have unique titles on each of my Blogger-based blog post pages, but have yet to figure out how to do anything about the same Meta tags being used on each page. Any suggestions about how to overcome that particular issue?