Follow Blog via Email

All bots banned

For those search engine bots and whatever else obeys robots.txt, I’ve disallowed access to my whole server. I’ll do this for a month, to clear the cache, then re-enable access (except for any image directories). I’ll be better with this stuff in the future.

Why am I doing this? Maybe it’ll clear things up in my logs and whatnot, especially pages I probably don’t want accessed anymore. Besides, I think the majority of my traffic comes directly from going to my blog. Forums are pretty much dead. The old site is there because of some pages that are being accessed, but I’ll probably just redirect all to a single page stating that it was taken offline.

So again, all bots are banned. I’ll rethink and redo everything a month from now.

If this is all jumbled and confusing, it’s because I’m kind of doing all this off the top of my head. If you want to take a look at my tiny robots.txt file, go ahead. Do I need to do anything different for subdomains? Do they need their own robots.txt file? Or will that single slash cover everything under the sun?

Like this:

LikeLoading...

Related

Published by

Bryan Villarin

Bryan is a Community Guardian at Automattic. He's also a photographer, card magician, and cat whisperer.
(Thanks to my friend and colleague Steve Blythe for the sweet photo!)
View all posts by Bryan Villarin

Anyways. What is wrong with people coming in from pages that do not exist? I suppose it matters much less in my case, because all 404 pages are custom. I even create custom ones for certain subdomains.

I check my BadBehavior logs every day and have them set to only store entries that are up to 2 days old. Some days, the log only has 20 entries in it. And on days like yesterday, the log may be filled with over 600. Sometimes you’ll see things that look like false positives, but if you send the developer the MySQL data for that entry, he’ll look into it for you. So far, everything that I thought was a false positive, turned out to be a bad bot.