Position Paper: DSTC Requirements for a Web Query Language

Background

The DSTC has a number of resource discovery products, including:
metadata aware search engines, metadata repositories and distributed
search services. All of these can be queried via the web. Some of
them in turn query other web databases. A standard web query
interface would allow better interoperability between our products and
other web based metadata software.

This document outlines our requirements for a web query language for
these products.

Some Scenarios

We want to support structured boolean querying of metadata
repositories.

Example 1
Find all records with Creator equal to "J. Smith" and
Date after "January 1997".

The metadata in the repositories will conform to a variety of domain
specific metadata standards. We want the ability to be explicit about
the origin of the metadata element being queried and the structure of
our query values.

Example 2
Find records with VCARD Name equal to
"J. Smith" AND with ISO 8601 encodedDublin
Core Date after "19970701".

We also would like to be able to specify the metadata fields to return
in results and the number of records returned.

Some metadata has nested structure. For example, metadata describing
a film may contain metadata describing sequences within the film.
The sequence metadata may contain metadata describing individual
scenes and so on. The query language should support queries on
metadata with nested structure:

Example 4
Return any Dublin Core Descriptions for the
first, second and third MPEG7 Scenes from
movies with Dublin Core Creator "Martin
Scorsese"

Some information communities use distributed search engines to
simultaneously query existing heterogeneous information sources. Such
applications are enhanced if it is possible to dynamically discover
the schema of the underlying information sources.

Example 5
What query attributes does this repository support?

Requirements

Attribute based boolean query language.The query
language should be able to specify attribute based boolean queries.

Multiple attribute sets. Different communities will
require their own sets of attributes. For this reason, the query
language must be flexible enough to allow attributes from different
communities. The query language and attribute sets should be able to
be developed separately. That is, the W3C should develop the query
infrastructure and information communities should develop the
attribute sets they require.

Sharing of Attributes.
Communities will not want to reinvent the wheel every time they need a
new attribute. Attributes must be able to be shared between
communities. An important part of sharing is the identification of
the origin and definition of the attributes in a query.

Identifying the source of attributes also allows attributes from
different communities to be mapped. For example, an application can
know that Dublin Core Creator is the same as GILS Author
and map a Dublin Core query onto a GILS database.

Attribute Categories.
Attributes tell the server how to interpret the values given in the
query. There are a number of categories of attributes that an
information community may wish to define. For example

The field to search on (e.g. Dublin Core
Date)

The matching relationship between the field and the query value
(e.g. equals, after)

The encoding and type of the query value (e.g. ISO 8601
encoded, or 16 bit integer)

Interoperability and Extensibility.
A number of us have the dream that one day there will be a "Lowest
Common Denominator" or "Cross Domain" attribute set that every
metadata repository supports. This allows a base level of
interoperability across metadata repositories.

Information communities should obviously be allowed to extend on this
base set of attributes for their private use.

Discovery of Attributes.
It should be possible to discover the attributes (and possibly
attribute definitions) being used by a metadata repository. This
enhances interoperability by allowing an information client to
configure itself to query newly discovered metadata repositories.

Ease of Implementation.
It should be easy to implement a search engine supporting the query
language.

The DSTC recommends that the query language use HTTP as the transport
mechanism and that the syntax of returned metadata records should be
based on XML, possibly in RDF format.

Security/Authentication.
Some customers require secure or authenticated access to their data or
subsets of their data. The query infrastructure should support this.

Specification of Returned Results. Including the
specification of result format and fields, and the size of the result
set to be returned.

Internationalisation. The query infrastructure should
support queries and results described using the Unicode character set.
Additionally, the query infrastructure should be able to identify the
language of the query values and returned records.

Other Work

Other groups have looked at the issues of web based query languages
and information retrieval infrastructures. We should take care to
learn from their experience.