Pages Not Crawled

Six months ago, I had a problem with my website over limit bandwidth and my hosting company restricted Google to my website (really stupid thing) via the robots file. I posted it on HR forum and the solution was to let Google crawled and indexed all the pages again.

Six months later, I have only 545 pages index which is about 15% of all the pages of my site. The other pages are unique content pages, some are accessible on most of my pages in the navigation.

I was wondering if making a sitemap would speed up the process or if there was a way to tell Google to go index these pages.

No it is not. I have checked everything carefully. I am really at lost on it. There is a page that I have linked from an outside site to get it indexed and it still does not crawl it. It is really weird.

Just wondering if it would have slowed down the robots because there was a noindex a while ago. The robots would still wonder if they have to index it.

People have reported inaccurate counts with Google's site: operator and the Webmaster Tools indexing reports. You can usually specify sub-directories on a site to find more pages indexed than you'll get in an initial report.

If you're still getting about the same amount of referrals from Google as before and you have not added much content, you should be okay.

If you have added additional pages and are getting more traffic from Google, you should be okay.

If you are receiving less traffic then maybe something is wrong.

It could also be that some of your inbound links are no longer passing value to your site.

I think I have found the problem. A while ago, I had a problem with my url. I am using Joomla and at one point I did something and it was adding something to the url like "index.php/provinces..." So I ended up with some pages that were duplicate and also some pages got indexed the first time that way.

I fixed it and removed the pages with the removal tool in webmaster tool. I think the bot got pretty mixed up with these changes.

That was about 2 months ago and I removed them about a month ago. Maybe it will take longer for the spider to figured it out.

Just wondering if at this point a site map would be benefit or ifI should just leave it like that.

A site map probably wouldn't hurt anything and cleaning up your URL issues is definitely a good thing to do. Just to be thorough, have you checked the pages in a Lynx browser, just to be sure there isn't something technical that unexpectedly blocking the page? Only takes a few seconds to look.

A sitemap isn't going to hurt, but I doubt it's going to help in this situation. A sitemap is just a list of URLs. Ostensibly, the message to Google is "please index all of these URLs". But what a sitemap isn't is an exclusive list of URLs. That is, there's nothing in a sitemap that tells Google "please index all of these URLs and don't index any URLs that aren't listed here."

If Google is trying to index URLs that don't exist, I don't think that adding a sitemap will change that. Assuming there are no bad links on the site that point to the nonexistent URLs, Google will check the URLs how ever many times it needs to until it's convinced that the 404 responses it's getting are always going to be there, and then it will stop checking them. Hopefully, once that process is completed, it will move on to finding legitimate links on your site to legitimate pages and crawling those URLs.

Yes, you probably just need to be patient, but I wouldn't be patient until after running your site through the spider simulator as Scottie mentioned. Because if there's still something wrong then all the patience in the world isn't going to fix it.

A spider simulator is a tool that shows you a pretty close approximation of what a search engine spider sees when it crawls your page. There's the View as Googlebot tool in Webmaster Central, or you can also use SEO Browser.

Another way to check your site for bad internal links is to use an actual spider program, like Xenu's Link Sleuth, but that requires you to download and install the program. It's very useful, though, not to mention free.