Distributed Searching Session
Session IV, Track A
Breakout Session Chaired and Reported by Ray Denenberg (Library of Congress)
-----------------------------
We set some ground rules up front: we would not debate whether distributed
searching is "good" or not, although clearly there are strong positions on
both sides, but the issue is out-of-scope; for purposes of this session we
assume the existence of distributed search. We also limited the scope to
exclude discussion of indexing and negotiation, since these were the subjects
of the other two sessions. Another scope limitation: we tried to focus on
issues related specifically to "distributed" searching, as opposed to just
"searching".
Issues
------
In the first part of the session we developed a list of issues pertaining to
distributed searching. We made some attempt to rank these issues, but only got
so far as to determine the first two, in terms of importance. They are:
(1) Model for merging and ranking, including de-duping.
(2) Semantic interoperability.
Other issues that we identified, in no particular order:
- Level of aggregation.
- Meta engines.
- Query syntax.
- Representation of quality/characteristics of an index.
- User control and transparency.
- Architecture.
- Discipline specific domains.
- Collection model; "clusters".
- Granularity; what constitutes a "hit"?
- Algorithmic modelling; e.g. looping, maximum length, time-to-live.
- Multi-national issues
- "Advertisement" model.
- Search vs. browse.
- Navigation.
- Schema negotiation.
Short/medium/long Term Areas for Standardization
------------------------------------------------
In the short term it seems reasonable to expect that there can be
agreement on query syntax.
In analyzing the issue of ranking and merging, we see three levels of
agreement. In the short term, "add value" and "native-known" ranking/merging
are possible. For the latter, the intermediary merges results based on native
ranking performed by servers, and is able to reflect the ranking methodology
(perhaps by an object identifier). "Add value" ranking/merging means that the
intermediary adds value to the native ranking of the servers. In the medium
term, "meta-data" based merging, and in the long term, "homogeneous"
ranking/merging, are the objectives. "Meta-data" based merging means that the
intermediary ranks and merges results based on metadata provided by the
servers. Homogeneous ranking means that the server provide ranked results with
consistent ranking and the intermediary simply merges the results.
On the subject of semantic interoperability, we defined three level: (1)
basic ad/hoc (2) static, and (3) dynamic, for the short, medium, and long
terms respectively. Static vs. dynamic are understood in terms of the Z39.50
model where static semantic interoperability in achieved via out-of-band
exchange (i.e. not via Z39.50) of necessary definition (attribute sets,
schemas, etc.) and the dynamic semantic interoperability is achieved via the
use of the Z39.50 Explain facility.
For the collection model, we defined two levels, for medium and long
term objective (no short term objectives seem realistic). In the medium term
it is possible to define metadata at the "service" level as well as "basic"
collection level metadata. In the long term, we can define "detailed"
collection level metadata.
In summary:
Short term
----------
- Query Syntax
- "add value" and "known-native" ranking/merging
- Basic ad hoc semantic interoperability
Medium Term
-----------
- "Meta data" based ranking/merging
- "Service" level meta-data
- "basic" collection-level metadata
- "Static" semantic interoperability
Long Term
---------
- "Homogeneous" ranking/merging
- "Detailed collection" level metadata
- "Dynamic" semantic interoperability