6 Reasons Why Your Awesome Site Isn’t Indexed

One of the first things we do when a potential client approaches us with their tales of search engine ranking woe is to see how many pages they have indexed in the search engine. Let’s face it; if the search engines aren’t even indexing your site, rankings are the least of your concerns.

Typically, when we say "search engine", we’re most concerned with Google, and a really easy way to see how big Google thinks you are is to do a quick site: query. This will give you a list of all the pages Google has indexed on your site and will clue you in to your indexing ratio (the number of pages Google has indexed in relation to the number of pages you actually have). Often when we do this test with new clients we’ll find that Google isn’t indexing their site at all due to some common search engine optimization mistakes.

Here are some of our favorites:

You’ve disallowed the spiders in your robots.txt: This will always be my most favorite reason for why sites are not indexed simply because it’s a classic search engine optimization mistake. If you’ve set your robots.txt to disallow the search engines from entering your site, you can’t complain when they follow your command. Go check out your robots.txt file and make sure you’re allowing the spiders into your site. If you’re finding that your site has 0 pages indexed, do yourself a favor and go check out that robots.txt file. If it looks like the one below, you have a problem:

User-agent: *
Disallow: /

Your server is too slow: Google’s not going to directly penalize you for running on the slowest server ever, but it may occur indirectly. If Googlebot notices that your site is having a hard time keeping up with their request for information, they’re going to hand it a cookie (the chocolate chip kind) and some juice and let it rest while they go spend time with someone else. This means they’re not going to get through your entire site before they stop crawling pages, which in turn means fewer pages for you in the index. You can’t fault Google. They don’t want to be responsible for crashing your site. So instead, they’ll just go on their merry Google way, leaving your site still standing but not fully spidered. They’ll pick up the rest of your subject’s information over at your competitors.

They think you’re a spammer: If Google has decided that you’re engaging in some bad behavior and are trying to deceive them or their users, they’re not going to index your Web site. And if you’re spamming and spending your days getting some color on that white hat of yours, you’re probably aware of what you’re doing. So stop it. Fix up your site and submit a reinclusion request to Google. They’ll take a look and if they decide you’ve pulled a Todd Friesen they’ll let you back into the index and start indexing your site again.

There’s another side to this. If you’re having trouble getting the domain you just bought 3 months ago to rank, it could be that you’re feeling the wrath of someone else’s penalty. Take a spin through the Wayback Machine and discover what your site looked like before you took control over it. If it was touting the non-friendly variety of PPC, you may be in for a hard time.

Bad Navigation: Is your navigation designed in all Flash? Does it consist of 90 percent broken links? Yeah? Well, then the spiders probably aren’t going to be able to access it, let alone index it. Way to go, genius.

Spider Traps Galore: Spider traps come in many different flavors and varieties. It could be that your JavaScript is taking up the first 2,000 lines of code, that you require cookies or some other user dependant action for entrance, that you’re sporting some seriously crazy dynamic URLs, that your home page is redirecting 7 times before finally hitting something, etc. All of these things are huge roadblocks for a hungry spider trying to get to your content. Remove them and give the search engines easy access. Otherwise, start putting your dollars back into advertising in your Sunday circular again because that Web site isn’t going to do you a hell of a lot of good.

Site’s down/Too many 500 errors: If the search engines keep trying to visit your site to no avail, eventually they may stop trying. They don’t want to index a site that isn’t going to load when users trying to access it. Returning these sites makes Google look like Yahoo’s confused cousin. Make sure your Web site is free of hosting issues and sits on a fast server. If you want to run a quick diagnostic on your site, I’d recommend the Check Server tool located on our free SEO tools page. We have lots of great stuff on that page, but the Check Server will help you identify most of the indexing obstacles your site may be facing.

Is it still necessary to use robots.txt on your pages for spiders to follow all of the content? What happens if this tag is not presented? Does Google assume that you site is to be crawled by default?
thanks,
Michael

Sounds like you’re mixing up the Meta Robots tag with the Robots.txt. The Robots.txt resides in the root of your directory and directs search engines about what to do about various pages on your site before they ever request one from the server. The Meta Robots tag lives in the Head section of each page and only pertains to the links and content on that page. Here’s a quick primer on how to use Robots.txt. If one isn’t present, the search engine will receive a 404 error from the server when they request it, which is why we suggest putting one in the root directory even if it is blank. Yes, the default behavior of the search engines is to be crawled, links followed and pages indexed.

HQ Hours of Operation:
8:30am to 5:30 pm Pacific timeDays of Operation:
Monday through Friday – email works other times in many casesSupport Operations:
M-F 9:00 to 5:00 Email Support FormTraining Facility:
Please see the training facility map