Hi validators,
I just recognised strange coincidences in the logs of my web
server: In times with lots of search robot hits, also validator.w3.org
wanted to validate my pages.
THE PROBLEM
===========
All the pages the web robots were visiting contain a link like
<http://validator.w3.org/check?uri=http://www.bawue.de/~uli/;weblint;pw>
(you could call this kind of a "validating link"). So I suspect the
search robots visited that URL (repeatedly!). This behaviour is not
contrary to the robots.txt file on validator.w3.org:
-----8<------------------------------------------------------------
#
# robots.txt for validator.w3.org
#
# $Id: robots.txt,v 1.2 1998/07/24 22:11:35 gerald Exp $
#
# User-Agent: *
# Disallow:
-----8<------------------------------------------------------------
SOLUTION PROPOSAL
=================
I think using
User-Agent: *
Disallow: /check
would have the following advantages:
1. for validator.w3.org: less system load
2. for sites with "validating links": less system load, more accurate
access counters
3. for the robots: they won't index pages nobody wants to find using
a search engine
I can't think of any disadvantages for any party.
Critical annotations and replys are welcome.
Regards,
Uli