Hey!
Looking at logs from my mirror (https://arch.jensgutermuth.de/) reveals
at least Google and AHrefs are crawling my mirror, which is obviously a
waste of resources for both sides. I'm thinking about blocking them (and
all other crawlers) using a robots.txt file like so (nginx config snippet):
location= /robots.txt {
return 200 "User-agent: *\nDisallow: /\n";
allow all;
access_log off;
}
Doing it this way prevents robots.txt from showing up in directory
listings and circumvents all issues with the sync script.
I know modifying mirror contents is a very touchy subject and rightfully
so. I therefore wanted to ask if there is some kind of policy and if
there is, if this would be allowed or a possible exception.
Best regards
Jens Gutermuth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.archlinux.org/pipermail/arch-mirrors/attachments/20180717/43f339fb/attachment.html>