Google SEO News and Discussion Forum

I have a strange issue that is currently going on with a site I run. It is a large forum and usually never has any issues with Google. Since about July 18th I noticed Googlebot had stopped going to the site. Normally Googlebot crawls about 3000-15000 pages per day (there are lots of pages, and the site is high PR).

I also have the site in Google Sitemaps so I went in there to see if there were any errors that might explain why Googlebot has stopped going to the site. There was indeed an error which says:

5xx error robots.txt unreachable

There is a detailed description of this error which says:

Before we crawled the pages of your site, we tried to check your robots.txt file to ensure we didn't crawl any pages that you had roboted out. However, your robots.txt file was unreachable. To make sure we didn't crawl any pages listed in that file, we postponed our crawl. When this happens, we return to your site later and crawl it once we can reach your robots.txt file. Note that this is different from a 404 response when looking for a robots.txt file. If we receive a 404, we assume that a robots.txt file does not exist and we continue the crawl.

So according to that error since it cannot obtain the robots.txt file it will not attempt to obtain any pages from the site until that gets resolved. This would explain to me why Googlebot might have stopped visiting the site. However, this doesn't resolve the problem.

The odd thing is that if you goto mydomain.com/robots.txt it loads perfectly fine. I checked to make sure the robots.txt file was still fine as well, and there are no problems with the syntax of anything inside the file. I haven't changed the file in months. To me 5xx error would indicate that my server would be somehow not returning the page properly, such as an Internal Server Error or the like. I checked all of my access and error logs and Googlebot hasn't even tried to visit the site period. It has shown in Google Sitemaps that it tried to retreive robots.txt on July 18, 2006 and then July 21, 2006 and it resulted in a 5xx error. I looked in my logs for those days and Google shows up in zero times. It never requested the file. It never actually reached the server in other words if it had tried. At first I was thinking maybe Googlebot just caught my server at a bad time and that is why it wasn't able to reach the server. However, my server has had no downtime and according to Google Sitemaps it tried on two separate days resulting in the same problem 5xx error.

Since its not even showing up in the httpd logs I started thinking that it could be my DNS not working properly. I verified that my sites DNS is indeed working correctly by checking at dnsreport.

Next I started to think that maybe I had inadvertently blocked Googlebot from visiting the server with my firewall. I checked the configuration and everything was exactly the same as it has always been. Still I wasn't entirely sure (maybe something slipped past me) so I decided to do a test by using another site on the server. This site is very small and very new so I submitted that to Google Sitemaps and within a day Googlebot stopped by and grabbed the robots.txt file with no problem. It shows up clearly in my logs too. In the Google Sitemap area for this site the status is also OK. So with that out of the way I that confirms my server is not blocking Googlebot.

Any ideas why Googlebot might have stopped coming? The site has been around since 2002 and has really never had any problems with Googlebot.

There are only two indications of the problem. The first is Googlebot is no longer visiting, and the second is the 1 error I see in Google Sitemaps:

5xx error robots.txt unreachable

If anybody has any ideas or has had this happen to them I would love to hear from you how to possibly resolve the issue. I did email Google today, but that could take a few weeks if ever.