This gave me some useful baseline numbers. Over a 24h period, there were 482
downloads, 318 of which came from bots. (That’s 2/3!) Looking at the top user
agents by downloads, five out of six were bots. The one exception was the
Overcast podcast app.

(Side note: Googlebot-Video is polite and includes Etag or If-Modified-Since
when it refetches files. It sent 68 requests, but exactly half of those resulted
in an empty 304 response. Thanks Googlebot-Video!)

I switched huffduff-video to use S3 URLs on the
huffduff-video.s3.amazonaws.comvirtual host,
added a
robots.txt file
that blocks all bots, waited 24h, and then measured again. The vast majority of
huffduff-video links on Huffduffer are still on the
s3.amazonaws.com domain, which doesn’t serve my robots.txt, so I didn’t
expect a big difference…but I was wrong. Twitterbot had roughly the same
number, but the rest were way down:

This may have been due to the fact that my first measurement was Wed-Thurs, and
the second was Fri-Sat, which are slower social media and link sharing days.
Still, I’m hoping some of it was due to robots.txt. Fingers crossed the bots
will eventually go away altogether!