Position Paper for the Distributed Indexing/Searching Workshop

At WebCrawler we are pleased to see W3C take an
active
interest
in the
area of resource discovery. We believe that offering Web users an
effective search experience requires increasingly more sophisticated
information exchange between information providers and indexing
systems. Practices to accomplish this will only gain critical mass if
they are standardised and backed by the industry as a whole, and W3C
could play a catalysing role.

The current generation of Web-wide indexing robots[1] all have to deal primarily with the same
issues, which would benefit from increased communication between
information providers, indexing services, and end users:

Avoiding indexing "bad" documents

This is partly addressed by the Standard for Robots Exclusion (SRE)
[2].
The SRE has got some problems, for which we would like to suggest some
solutions.

Finding "good" documents to index

This can be addressed by simple mechanisms based on the SRE or other
server-centric mechanisms.
On a document level, relationships between documents need to be
identified.

Describing documents

The suggestion of a small standard set of meta-data on a document
level (using META or LINK tags [3]) is an obvious
and effective step we'd welcome.
More elaborate rating schemes such as PICS could even address group
ratings of resources, but are not readily deployed.

Users searching for documents

While differentiation is important in the marketplace, the user would
benefit from standard search mechanisms, such as query language
constructs.

Efficient indexing

Finally, mechanisms to aid the mechanics of indexing (such as Harvest)
would be beneficial, but are likely to be slow in being deployed
world-wide, and warrant separate consideration from the issues above.

We look forward to discussing these and other issues further at the
workshop.