Problem Statement

Distributed searching tools have proliferated in the absence of any
standards for query syntax or resource discovery. As a result users must
become familiar with both the contents and query rules for interacting
with a variety of search engines. While there is some commonality provided
by the general adoption of HTTP/HTML for managing the interaction with
the user, this does not extend to the user interface itself.

URNs provide some of the framework for handling the resource identification
aspects of this problem, but no architecture now exists to implement URNs
across a broad spectrum of networked servers. A few projects have attempted
to address the problem of mapping a standard query syntax into multiple
information servers, but generally only within a single protocol (for example,
Willow and Z39.50). There is some work going on at Berkeley as part of
the digital library initiative there to generically map queries into servers,
but the work is still in its infancy.

Proposed Solution

An architecture incorporating client query proxies can address many
of the problems inherent in a distributed network of search engines. The
query proxy can use HTTP/HTML to communicate with the user. Each user registers
with the proxy server and provides information on his/her query syntax
preferences. The preferences are stored on the query proxy, and discarded
after a preset period of inactivity. When the user issues a query, the
syntax is based on that user's preferences. The query submission also includes
the server to search. The proxy then launches a process that contacts the
server, downloads (and optionally caches locally) that server's query syntax,
performs mapping from the user's syntax to the server's syntax, and issues
the query. Query results are returned to the proxy, and passed back to
the user. A block diagram of the architecture is available separately.

This approach requires the definition of a protocol for proxy-to-server
communication, and standards for the definition of preferred query syntax.
Ideally this would be handled similarly to Whois++, where the server can
be queried for its templates and help files. However, in the short term
it should be possible to agree on a port that will dump the necessary configuration
information in a predefined format in response to a telnet connection from
the query proxy.

To ease the problem of hand-coding the client query proxy with the
query capabilities of the target server, it should be quite possible to
add a function to search servers to describe, in a common vocabulary, the
search and query capabilities of the server, to make the collection of
this information feasible for machines. This dovetails nicely with the
ideas described in our second position paper on indexing
proxies.