Saturday, September 28 2013

As you might guess a guy like me ought to be, I'm absolutely
compulsive about avoiding data loss. I provide an off-site backups
service for my clients, and of course I take backups of my own stuff
to the same server upon which I keep client data, and I keep another
external drive sync'd with the most important stuff from my workstation.
But that's not enough because I need off-site backups for my own data,
too. I also need to keep a few devices synchronized. But I can't use
the most popular services because they don't provide the level of
security that I demand, the same level I provide for my clients.

Enter Wuala.
It's similar to Google Drive and Dropbox, but with one very vital
distinction: It's encrypted end-to-end, not just in transit.
That means that the good folks of
Wuala
can't get at my data. It's mine. They won't be indexing my data to
figure out which ads to serve to me. Someone getting hold of the files
I've stored there, which are encrypted, won't be reading them any time
soon.

The synchronization service ensures that my netbook has the files I
consider most important on my workstation, up to date, when I hit the
road. Any work I do while away is already on my workstation when I
get back to the office -- no more fooling around with USB sticks, no
more rsync'ing to get everything caught up. I just keep the Wuala
client running, and the rest happens automagically.

Monday, August 06 2012

I periodically look through my web server logs to pick out things that
are not as they should be. You might recall from previous blog entries
that I operate spam traps and so on -- last night I picked out of my
server logs that some critter calling itself MJ12bot was going where no
legitimate bots belong. But it's apparently trying to be a good bot
because it leaves its calling card:

So off I go to that URI, and find that the folks who run the thing have
said "If you have reason to believe that MJ12bot did NOT obey your
robots.txt commands, then please let us know via email..." And so I
did.

We discussed the matter via email a bit, and it seems probable that
their bot encountered some kind of network error when it tried to grab
my robots.txt file. Not an error response from my server, but a failure
to even contact my server. To my way of thinking, in a case like that a
properly designed bot will try again to get that file, and will not
crawl the site until it gets either the file or a verifiable 404 Not
Found. Not MJ12bot, though. The network failure is treated as if it
were a 404, and is taken to mean that the whole darn site is wide open
to them. Here's what their guy Alex said:

Sadly it's very difficult for us to diagnose this case - as
you can see from your logs our bot grabs robots.txt, so we are not
intentionally breaking your directives, it's just if bot could not get
robots.txt then it could not obey it :(

Huh? Your bot encounters a network error and that gives you
license to crawl my site in violation of my terms of service? It seems
to me that if you know your crawler is broken in that way, which you do
now, and you continue to run it knowing it's broken in that way, then
what you're doing is willful negligence and that makes it
intentional.

No worries here. I've informed the folks behind the thing that their bot
is no longer welcome here and any connections it makes will be
considered trespass. The fun part? When their bot comes around it will
not see my web site. It will instead see a very, very long joke that
will be delivered very, very slowly. How slowly? From start to finish
will take from an hour and a half to more than six hours.

If you've seen a bad bot in your logs and want to punish it in this way,
feel free to hit my contact form to inquire
about it. It's a freebie if all you need is the application itself and
very minimal installation/configuration instructions. After all: Bad
Bots Must Be Punished!