Abstract

The Protocol for Web Description Resources (POWDER) facilitates the publication of descriptions of multiple resources such as all those available from a Web site. This document describes how sets of resources may be defined, either for use in Description Resources or in other contexts. An OWL Class is to be interpreted as the Resource Set with its predicates and objects either defining the characteristics that elements of the set share, or directly listing its elements. Resources that are directly identified or that can be interpreted as being elements of the set can then be used as the subject of further RDF triples.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a First Public Working Draft, designed to aid discussion. In particular the group wishes to receive feedback on sections marked as TBD. The POWDER Use Cases and Requirements document [PUC] details the use cases and requirements that motivated the creation this document.

This document was developed by the POWDER Working Group. The Working Group expects to advance this Working Draft to Recommendation Status.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

1 Introduction (Informative)

The Protocol for Web Description Resources (POWDER) facilitates the publication of descriptions of multiple resources such as all those available from a Web site. These descriptions are attributable to a named individual, organization or entity that may or may not be the creator of the described resources. This contrasts with more usual metadata that typically applies to a single resource, such as a specific document's title, which is usually provided by its author.

Description Resources (DRs) are described separately [WDR]. This document sets out how groups (i.e. sets) of
resources may be defined, either for use in DRs or in other contexts. Set theory has been used throughout as it provides a well-defined
framework that leads to unambiguous definitions. However, it is used solely to provide a formal version of what is written
in the natural language text. A companion document [VOC] describes the RDF/OWL vocabulary and XML data types that are derived from this and the
Description Resources document, setting out each term's domain, range and constraints. As each term is introduced in this document,
it is linked to its description in the vocabulary document. The POWDER vocabulary namespace is http://www.w3.org/2007/05/powder#
for which we use the Qname wdr.

1.1 Design Goals and Constraints

In designing a system to define sets of resources we have drawn on earlier work [Rabin] carried out in the
Web Content Label Incubator Activity [WCL-XG], and taken into account the following considerations.

It must be possible to define a set of resources, either by describing the characteristics of the resources in the set, or by simply listing its elements.

It must be possible to determine with certainty whether a given resource is or is not an element of the Resource Set

The ease of creation of accurate and useful Resource Sets is important.

It should be possible to write concise Resource Set definitions.

Resource Set definitions must be easy to write, be comprehensible by humans and, as far as is possible, should avoid including or excluding resources unintentionally.

It must be possible to create software that implements Resource Set definitions primarily using standard and commonly available components and specifically must not require the creation of custom parsing components.

So far as is possible, use of processing resources should be minimized, especially by early detection of a match or failure to match.

1.2 Outline Methodology

Defining a Resource Set by specifying the characteristics that the resources in the set share is clearly an indirect approach, albeit a very useful one
in the real world. In a logical sense, the definition must be interpreted to arrive at the full set. The implicit constraint on the
resources in the set is that they exist. Newly created resources that match the set definition will become members of the Resource Set, even
though at the time the definition was created, they didn't exist. Despite this, as stated above, Resource Set definitions must be unambiguous so that an application can
always determine with certainty whether a specific resource is or is not within the defined set of resources.

More formally, a Resource Set definition D denotes a set of resources RS = DI,
where DI is the interpretation of D, i.e., the set of resources sharing the characteristics denoted by D.

We take this further and allow a set definition to be built up in stages.

A Resource Set RS is denoted by a set definition DRS in terms of one or more characteristics that
the elements of the set have in common. Each characteristic itself gives rise to a set definition D1, D2, …, Dn, so that the complete set definition DRS comprises D1, D2, …, Dn.

The Resource Set RS is the intersection of the sets denoted by the definitions in DRS.

Formally, RS = DRSI = D1I ∩ D2I ∩ … ∩ DnI = (D1 ∧ D2 ∧ … ∧ Dn)I.

For example, suppose that a resource set RS is denoted by the following definitions:

D1: “the resource is available from example.org”

D2: “the resource has a URI with a path component beginning with foo“

As already noted, there is a further definition here that is implicit, namely that the resources exist. Therefore, the complete set definition,
DRS, denotes those resources that exist AND that have the characteristics of being available from example.org AND that
have a URI with a path component beginning with foo.

We define an instance of an OWL class to take the place of the Resource Set and
the properties of that Class are the set definitions D1, D2, …, Dn. The example can therefore be written using the following pseudo triples:

RS

rdf:Type

Resource Set

is_available_from

example.org

has_a_URI_with_a_path_component_beginning_with

foo

Whether a specific resource R, known as the candidate resource, is a member of Resource Set RS or not, is determined by comparing its characteristics with those denoted by the set definitions used in DRS. It must be an element of the intersection of the sets defined by the interpretation of D1, D2, …, Dn to be an element of RS.

If a set definition is empty, that is, if the Resource Set Class has no properties, then the set is undefined and RS MUST be considered as the Empty Set. Formally:

Let RS be a resource set, and let DRS be the set of resource set definitions denoting the resources in RS: if DRS = ∅, then RS = ∅.

There are two ways in which a Resource Set may be defined.

by reference to the address (IRI, URI or IP address) of the resources that are members of the set, described in Section 2;

by reference to the properties of the resources that are members of the set. As described in Section 3, such properties may or may not be knowable from sources other than the resources themselves.

A Resource Set may be defined using any combination of these methods. Furthermore, each may be negated so that, for example, it is possible to define a set
as "all resources on example.comexcept those on video.example.com shot in widescreen format." This is
shown in Example 4-6.

2 Grouping by address

A Resource Set may be defined in terms of the IRIs, URIs or IP addresses of resources that are its members. Determining whether a candidate resource, is or is not a member of the set, can therefore be done by comparing its address with the data in the set definition. Importantly, if the set is defined solely in terms of IRIs or URIs, this can be done before deciding whether to fetch the candidate resource or perform a DNS lookup, thus maximizing processing efficiency in many environments.

We provide a range of methods to support set definition by address.

2.1 Grouping by IRI/URI component

The syntax of a URI, as defined in RFC3986 [URIS], provides a generic framework for identification schemes that goes
beyond what is demanded by the POWDER use cases [PUC]. We therefore limit our work to IRIs and URIs with the syntax:
scheme://authority/path?query#fragment, noting that Resource Sets may be defined using additional vocabularies as
set out in Section 6. The authority component is further divided
into user information, host and port. An example of a URI that uses all these components is shown below.

For each URI component we define a corresponding RDF property, the value of which is a white space-separated list of strings, any one
of which must match the relevant portion of the URI of the candidate resource.

Formally, we have a set definition D = URI component matches(?x, {string1 | string2 | … | stringn}), where ?x is a variable denoting the URI component under consideration, and {string1 | string2 | … | stringn} denotes a set consisting either of string string1, or string2, or … stringn.

Any number of set definitions D1, D2, …, Dn can
be declared and, as stated in Section 1.2, the overall Resource Set is the intersection of the sets that can be
interpreted from those definitions. However with some exceptions, each particular RDF property can only appear 0 or 1 times and some are mutually exclusive.
Greater detail on this is provided as terms are introduced and in Section 4.

Strings are matched according to one of four rules:

startsWith, meaning that the URI component starts with any of the strings listed in the value of the relevant property;

endsWith, meaning that the URI component ends with any of the strings listed in the value of the relevant property;

exact, meaning that there is an exact match between the candidate URI component and at least one of the strings listed in the value of the relevant RDF property;

contains, meaning that at least one of the strings listed in the value of the relevant RDF property appears somewhere in the URI component.

Recognizing the great diversity of potential uses and set definition requirements, multiple properties are defined relating to the
path and query components. Furthermore, for each property there is a 'negative' property, that is, a property whose value is a list of
strings that must not be present in the relevant URI component.

As a quick example, the set of all resources on example.org, whether fetched using http
or https, where the path component of their URIs starts with foo, and where the path does
not end with .png is defined thus:

The semantics and constraints of each of the terms in Table 1 is further defined
in the POWDER Vocabulary document [VOC]. Precise details of how values for each term are combined
is discussed is Section 4 below. However, it is worth noting the points made in the following sub-sections.

2.1.1 Resource Set Definitions Referring to Ports

Ranges of Ports are defined as x-y, where x < y, that is, the lower and upper values in the range are separated by a hyphen. Multiple ranges can, of course, be listed using white space as the separator. Specific ports can be included or excluded using the includePorts and excludePorts properties so that the set of all resources on example.org via ports 3125 to 5236 excluding ports 4345 and 5000 can be expressed as in Example 2-2.

†includePorts and includePortRanges are mutually exclusive, that is, a Resource Set definition
may include 0 or 1 of these RDF properties but not both. This is because, as has been noted, a candidate resource must share all
of the characteristics defined in the Resource Set to be an element of it. Multiple definitions of port numbers would therefore require
the URI of a candidate resource to have multiple ports (which is impossible).

‡includePathContains and includeQueryContains may appear any number of times within a Resource Set definition so
that it is easy to create one in which multiple strings must be present in paths and/or queries. This is in contrast to all other
terms in Table 2 which can only occur 0 or 1 times since the URI of a candidate resource can only have one scheme, one host etc.

Query strings typically contain a series of name-value pairs separated by ampersands thus:

?name1=value1&name2=value2

These are usually acted on by the server to generate content in real time and the order of the name-value pairs is unimportant.
For practical purposes ?name1=value1&name2=value2 is equivalent
to ?name2=value2&name1=value1. Therefore, if the candidate
resource's URI includes a query string, and if the Resource Set definition refers to the query string then:

For each string in the list given as a value for any of includeQueryContains, excludeQueryContains,
includeExactQueries and excludeExactQueries, a POWDER processor must split the string
into its constituent pairs at the ampersand character*.

The name value pairs in the query string in the candidate resource's URI MUST match the name-value pairs derived from the
Resource Set definition in any order.

* If a server is known to use a different delimiter then a different RDF property must be defined, see Section 6.

N.B. If using the RDF properties relating to the query string of a URI then the real-time generation of content
should be taken into account. It may be difficult, if not impossible to predict with certainty what the content of the resource
will be and therefore the Resource Set may not be fully defined. It follows that query string-based RDF properties should be used with caution.

2.1.3 IRI/URI Canonicalization

Before any IRI or URI matching can take place the following canonicalization steps should be applied to the candidate resource's IRI or URI.
These steps are consistent with RFC3986 [URIS], RFC3987 [IRIS] and URISpace [URISpace].

If not already so encoded, the IRI/URI character string is converted into a sequence of bytes using the UTF-8 encoding.

2.1.4 Data encoding

To complement the URI/IRI canonicalization steps described in the previous section, related processing steps must also be carried out on the
strings supplied as set defining data, that is, the values for the RDF properties listed in Table 2.

Bear in mind that if the data is serialized in XML, URI strings specified in the resource set definition will be escaped
according to the XML syntax using entity references for specific characters (escaping < with
&lt; and & with &amp; is mandatory, others may also be used). Moreover, since Resource Set
definition properties take a white space-separated list of URI strings as their value, whenever a URI string contains an unescaped white
space (i.e., a white space not encoded as %20), it will be substituted by %20.

The following steps should therefore be applied to each item in the list separately.

If not already so encoded, the strings are converted into a sequence of bytes using the UTF-8 encoding.

XML character entities are replaced by the characters they represent. For example, &amp; becomes &, etc.

Percent encoded triples are converted into the characters they represent.

If the data relates to the scheme or host, it is normalized to lower case.

If the data relates to the host, trailing '.' characters are removed.

Any values given for the RDF properties includePathStartsWith, excludePathStartsWith, includeExactPaths or
excludeExactPaths must begin with the '/' character which is pre-pended if absent.

If the set definition includes values related to the port then matching of the data against the candidate resource's URI/IRI must be carried out as follows:

If the set definition includes the RDF properties includePorts or includePortRanges then, when matching,
if the default port for the candidate resource's URI/IRI is present in the list of supplied values (or the specified ranges), but the
candidate resource's URI/IRI does not specify the port, the candidate resource IS an element of the set IF all other conditions are met.

If the set definition includes the RDF properties includePorts or includePortRanges then, when matching,
if the default port for the candidate resource's URI/IRI is present in the list of supplied values (or the specified ranges), but the candidate
resource's URI/IRI does not specify the port, the candidate resource is NOT an element of the Resource Set.

2.2 Grouping by Regular Expression: The includeUriPattern and excludeUriPattern Properties

The RDF properties discussed above all take white space-separated lists of strings as their values. It is
believed that these properties will be easy to use and cover the overwhelming majority of cases. However, the use of strings
with fixed matching rules clearly presents a restriction on flexibility.

To support fully flexible set definition by URI, the includeUriPattern
and excludeUriPattern
properties take a Perl Regular Expression (RE) [PERL] and should be applied to the complete candidate URI
(after following the canonicalization steps above).

The WG notes the existence of other RE syntaxes, notably that offered by XML Schema. We are also aware of work being done in
the X Query group on a new syntax. We are ready to adopt either of these, or another syntax, if there is a strong case for doing so. Perl has
been chosen for now because of a) its existing and widespread adoption; and b) its support for 'non-greedy' matching which
we believe will be useful. We are particularly keen to receive feedback on this issue and also how it affects the encoding of white space in the RS
definition where it is and is not used a delimiter.

N.B. The value of the includeUriPattern and excludeUriPattern properties MUST be a single
Regular Expression, not a white space-separated list.

As an example, the set of all the resources hosted either by example.org or example.net, where the path component of their URIs starts either with foo or bar, can be defined thus:

Example 2-3: Set definition by regular expression (not including character escaping)

2.2.1 Safe Use of includeUriPattern

Example 2-4 uses a modified version of the RE given Section 2.1, substituting individual portions with specific strings.
This is the safest method but is not, perhaps, the most natural way to proceed. If a less rigorous approach is taken it is
easy to make mistakes when specifying REs, and incorrect REs in set definitions will have one of two possible (and obvious) consequences

The intention in the RE given in Example 2-5 is probably to say "all resources on example.org with a URI beginning with https." However, as the RE is not anchored at either end, what this actually means is "all resources on example.org where the URI includes https". Thus this Resource Set includes both:

https://www.example.org/page.html

http://www.example.org/why_we_use_https.html

Adding in anchors at the beginning and end of the RE can have equally undesirable consequences.

Example 2-6: A second example of a bad set definition by regular expression

In Example 2-6, the intention is, again probably, to define the set of "all resources on example.org fetched using https only". However, adding both the ^ and $ anchors at the beginning and end of the RE means that the whole URI must be https from start to finish — which can never be true so this Resource Set is equivalent to the empty set.

Example 2-7 shows one possible way to encode the intended set definition.

Example 2-7: An example of a correct set definition by regular expression

Whilst Example 2-7 'works', the potential dangers of using REs mean that it is generally better to use component strings where possible. Example 2-7 is therefore better written as shown in Example 2-8 below.

Example 2-8: A re-write of Example 2-7 without using a regular expression

2.3 Grouping by IP Address

A set of resources can be defined in terms of the IP address(es) from which the resources are served. To support this we define two
RDF properties: includeIPs, which takes a white space-separated
list of single IP addresses, and includeIpRanges which takes
a white space separated list of CIDR blocks [CIDR]. Negative versions
of the these RDF properties are also defined: excludeIPs and
excludeIpRanges respectively.

As with includePorts, and for similar reasons,
includeIPs and includeIpRanges are mutually exclusive, that is, a Resource
Set may include one or other, but not both of these RDF properties.

The includeIPs RDF property is simple enough: Example 2-9 defines the Resource Set as all resources available from IP address 123.123.123.123.

Example 2-9: A Resource Set definition using the includeIPs RDF property

The includeIpRanges RDF property allows the definition of a resource set based on a range of IP addresses, specified in a CIDR block. A CIDR block has the form <IP address>/x, where the CIDR prefixx is a number ranging from 1 to 32, denoting the leftmost x bits which a set of IP addresses shares. For instance, the CIDR block 123.234.245.254/8, denotes the range of IP addresses sharing the leftmost 8 bits, i.e., starting with 123.

As an example, suppose that a Resource Set definition should denote all the resources hosted by the machines with IP addresses 123.234.245.254 and 123.234.245.255. This can be expressed by the following Resource Set definition:

Example 2-10: A Resource Set definition using the includeIpRanges RDF property

2.3.1 Safe Usage of the includeIpRanges Property

In order to use CIDR blocks correctly, it must be taken into account that a CIDR prefix refers to the binary representation of an IP address. For instance, the binary representation of IP address 123.234.245.254 corresponds to

This also means that the CIDR block 123.234.245.255/31 is equivalent to 123.234.245.254/31.

It is important to note that the number N of IP addresses denoted by a CIDR block corresponds to 232−x.
Therefore, if x = 32, N = 20 = 1, if x = 31, N = 21 = 2, etc. Therefore,
it is possible to denote a range of IP addresses using wdr:includeIpRangesonly when the number N of IP
addresses is a power of 2. Otherwise, it is necessary to provide a white space separated list of CIDR blocks or, alternatively, individual IP addresses.
For instance, the resources hosted by the machines with IP addresses 123.234.245.253, 123.234.245.254, and 123.234.245.255
can be expressed as shown in Example 2-11.

Incidentally, as already noted, includeIPs and includeIpRanges are mutually exclusive. It is perhaps
tempting to create a Resource Set definition like that shown in Example 2-12, however, this would require a candidate resource
to be available from both 123.234.245.253 AND either 123.234.245.254 OR 123.234.245.255 which is impossible
so that Example 2-12 is tantamount to the empty set.

Example 2-12: Erroneous Resource Set definition across several IP addresses

Defining Resource Sets by IP address puts a burden on the processor since it will often have to perform a DNS look up to determine whether a candidate resource is, or is not, a member of the Resource Set. Furthermore, it is particularly easy to include resources in the set by accident using such a broad-sweep approach. If a Web site is hosted on a shared server, for example, it is very likely that the set will include resources by mistake.

Defining a Resource Set by IP address would, however, be appropriate where a content provider operates a large network of servers, or where particular types of content to be described are hosted on servers that can easily be identified by their IP address.

2.4 Enumerating Elements of a Resource Set: the includeResources and excludeResources properties

It is useful to be able to include or exclude resources from sets by simple listing. The
includeResources and
excludeResources RDF
properties support this, both of which take white space separated lists of IRIs and/or URIs. To give a simple example, the set
of all resources on example.org except its stylesheet and JavaScript library can be encoded as shown in Example 2-13.

As emphasized throughout this document, each RDF property and its value creates a set definition of its own and the full Resource Set is the
intersection of those sets. Thus an alternative way of looking at Example 2-13 is to say that a candidate resource is a member of the
Resource Set if it is on example.org AND does not have the URI http://www.example.org/stylesheet.css
AND does not have the URI http://www.example.org/jslib.js.

It is tempting to use includeResources in a similar fashion as shown in Example 2-14.

The intention in this example is to include the W3C's valid XHTML 1.0 icon in the set of resources on example.org. However, a
resource would have to be both on the example.org host AND have a URI that matched http://www.w3.org/Icons/valid-xhtml10 to
be an element of the set. Since this is impossible, such a definition is, again, tantamount to the empty set.

The solution is to use the OWL set operator owl:unionOf as shown in Example 2-15.

Here we have two discrete Resource Sets, each of which is made up of, in this case, a single RDF property and its value; and the overall Resource
Set comprises the union of those two sets. The use of the OWL set operators is discussed in detail in Section 4.

2.5 Redirection: the includeRedirection property

If a Resource Set is defined in terms of the URIs of the resources that are elements of the set then resolving the URIs may
lead to redirection through 3xx HTTP status codes [HTTPCODE]. By default, such redirection MUST lead to the 'new'
resource itself being compared with the Resource Set definition. That is, if the resource identified by URI1 is
an element of the Resource Set but, when resolving it, the user agent is redirected via a 3xx HTTP response code to URI2,
then the resource identified by URI2 MUST itself be compared with the Resource Set definition to determine whether or not it
is an element of the set.

Recognizing that there may be circumstances where this default behavior may cause unnecessary latency, redirected resources MAY
be included by use of the includeRedirection property.
The range of this RDF property allows for any of HttpAnyRedirect,
HttpPermRedirect or
HttpTempRedirect to be given as its value.
These classes are all based on those defined in the HTTP in RDF vocabulary [HTTPRDF]. See the POWDER Vocabulary [VOC] for details.
As their names suggest, the HTTP redirection classes allow Resource Set definitions to allow any redirection, specifically permanent redirection
(i.e. HTTP response code 301) or any of the temporary redirection HTTP response codes (302, 303 and 307).

Example 2-16 encodes that if, when resolving any URI on the example.org domain (or its sub-domains), the user agent is
redirected through a 301 (permanent) HTTP response code then the target resources are elements of the Resource Set, even if those resources
are on a different domain. Resources resolved following other redirects would not be included unless they were also on the example.org domain.

3 Grouping by Resource Property

A set of resources can be defined in terms of the properties of its elements. There are no constraints on how such properties can be discovered, for instance, they may be extracted by analyzing resources themselves, they may be included in HTTP headers, declared in a separate online data source or in other Description Resources.

3.1 Basic Resource Set Definition by Resource Property: the includeConditional and excludeConditional properties

We provide an RDF property that supports the definition of a Resource Set by any property of any resource.
Example 3-1 defines the set of resources whose language is French. In this case, the only method open to a processor to determine
whether a candidate resource is, or is not, an element of the Resource Set is to retrieve it and note the language used.

The RDF properties includeConditional and
excludeConditional
link to the base RDFS Class 'Resource' that represents the resources that are, or are not, elements of the set respectively. Any
characteristic of those resources can be defined in the usual way to confer membership of the Resource Set or exclude resources from it — the
prefix 'ex' standing for any vocabulary. Although, in common with most other set definition terms, includeConditional and
excludeConditional may each only occur once in a Resource Set definition, any number of predicates from any vocabularies may be
defined as RDF properties of the RDFS Class to which they link.

N.B. Inclusion of the includeConditional and excludeConditional RDF properties requires a
POWDER processor to fetch, parse and understand any resource. A generic processor might very well therefore have an output like
"The candidate resource is an element of the Resource Set if it is subsequently found to have the following characteristic... "
It therefore goes against the design goals which call for a closed world solution. The working group welcomes
any comment on this aspect and wishes to flag this as a feature at risk.

3.2 Resource Set Definition by Property Look Up

There are cases where the method outlined in the previous section may be the only available, or most
suitable one, for determining the relevant property of a candidate resource. For example, one might wish
to define a set of documents that contain a particular word or word pattern. In other cases,
it will be preferable to define a Resource Set by referring to an external data source. For
example, content providers may make data available about their resources through an API to their existing
database. To support this we define a generic look up mechanism.

In Example 3-2, the set definition comprises a simple property look up facility. If an HTTP HEAD request is
sent to the candidate resource's URI and the response contains 'Content-Language: fr' then the
candidate resource is an element of the Resource Set. Notice that the white space in the value of includeResponseContains
is percent encoded. This is because, in common with most set definition properties, this RDF property takes a white
space separated list of values. It follows that all white space characters, except those delimiting possible
values for the response, must be percent encoded within the includeResponseContains set definition,
and decoded before comparison with the response received from the look up URI.

The lookUpURI
RDF property takes a single URI as its value, not
a white space separated list. In other words, if a Resource Set definition includes a property look
up then this MUST be to a single URI, although there can be multiple responses to a request sent to
that URI that confer membership of the set on the candidate resource. More complex Resource Set
definitions are discussed in Section 4.

Notice that the look up URI property (line 4 of Example 3-2) supports a variable {cURI}. This is
the URI of the candidate resource and can appear on its own, as here, or as part of another URI as shown in
Example 3-3.

The syntax chosen for the template URI is consistent with
recent work at the IETF. However, as the
RFC has now expired, we plan to treat this is an information resource, not a normative reference unless it becomes an RFC before work on this document is completed..

Example 3-3: Resource Set definition with property look up and template URI.

The look up URI is now a database API and the creator of the set definition has provided a template
into which the processor can insert the URI of the candidate resource. As in Example 3-2,
the httpMethod property in
line 5 of Example 3-3 uses the HTTP in RDF vocabulary [HTTPRDF] to define what type of HTTP
request should be sent to the look up URI (in this case HTTP GET) and, in line 6, what the response must contain
if the candidate is an element of the resource set. We define a small number of RDF properties that can be used
to describe the response:
includeResponseContains,
meaning that the given string must be present in the HTTP response for the candidate resource to be an element
of the Resource Set, and
includeExactResponse,
meaning that the HTTP response must be exactly as given. Negative versions of these properties are also defined:
excludeResponseContains
and excludeExactResponse.

A valid instance of the Class PropLookUp MUST have both
lookUpURI and httpMethod properties as well as at least one of
includeResponseContains, includeExactResponse, excludeResponseContains or excludeExactResponse.

The example here is a simple one that assumes that the HTTP Response headers will include exactly
Language: fr. Such an assumption can only be made if the server handing requests made to the given
URI is known to include such a response in such a format. It is NOT safe to make such assumptions without
detailed knowledge of the particular server in question since, for example, HTTP Response Headers are
case insensitive, there may be more than one white space character after the colon and so on. There is
no facility to encode (directly) instructions such as 'parse the HTTP Response headers and look for a
header that matches xyz.'

In the more general case, it is noteworthy that the simple HTTP-based property look up facility is
designed to support queries made using any suitable protocol, examples of which include, but are not
limited to, SPARQL, SOAP and simple CGI or PHP-based APIs.

Maybe we should also define a postContent property as well? i.e. an optional property, the value of which would be HTTP POSTED
to the give URI?? This might make support for SOAP more explicit?

4 Conjunction and disjunction

4.1 Combining Definitions Within a Resource Set

As set out briefly in Section 2.1 and referred to throughout this document,
Resource Sets are defined using RDF properties whose values are white space separated lists of possible values. The exceptions to this
are the includeUriPattern and excludeUriPattern properties which take a single Regular
Expression. Taken from the point of view of determining whether a candidate resource is or is not an element of the
Resource Set, the values of the include RDF properties are combined with logical OR.
In Example 4-1, the candidate resource is an element of the Resource Set if it is on example.org
OR example.com.

This is the only way to encode the set of resources on these two hosts (excepting the possibility of doing so using a Regular Expression).
A validation error SHOULD be raised if any set definition RDF property, other than includePathContains
or includeQueryContains, appears more than once in a given Resource Set. Example 4-2 is therefore invalid.

Example 4-2: An invalid Resource Set definition in which a single RDF property appears more than once

A candidate resource MUST satisfy ALL definitions in a given Resource Set. Therefore the set of all resources on example.org or
example.com that have a path starting with foo or bar is defined as shown in Example 4-3.

Expressed using set theory, each RDF property is a resource set definition intentionally denoting a set of resources.

Thus, given the following two resource set definitions:

D1 = includeHosts(?x, {example.com, example.org})

D2 = includePathStartsWith(?x, {foo, bar})

the Resource Set is the intersection of the extension of such resource set definitions:

RS = D1I ∩ D2I

In natural language, the same is true for the exclude properties. That is, Example 4-4 says that a resource is a member of the set if it is on example.org and does not have a path beginning with foo or bar.

4.2 Combining Multiple Resource Sets

It is believed that the RDF properties described in sections 2 and 3 provide sufficient flexibility to cover the majority
of uses for the grouping of resources. However, there is a clear limit on expressivity which needs to be addressed,
for example, it is impossible using the system described so far to express the set of resources on
example.org with a path beginning with foo and the resources on example.com that have
a path beginning with bar (again, that is, it's impossible without using the includeUriPattern property
and a regular expression). To define such a Resource Set requires the union of two discrete sets and this can be
achieved using the OWL set operators [OWLSO], as shown in Example 4-5.

Lines 3 - 6 and 7 - 10 of Example 4-5 are Resource Set definitions in their own right and the overall Resource Set is the union of these two. Formally we can write:

D1 = includeHosts(?x, {example.org})

D2 = includePathStartsWith(?x, {foo})

D3 = includeHosts(?x, {example.com})

D4 = includePathStartsWith(?x, {bar})

RS1 = D1I ∩ D2I

RS2 = D3I ∩ D4I

RS = RS1 ∪ RS2

OWL's intersectionOf set operator can also be used although it is anticipated that this will be rare since a Resource Set is
the intersection of the various sets defined within it. One scenario where it is appropriate to use owl:intersectionOf is where
Resource Sets are defined by reference to multiple external data sources using the property look up method described in
Section 3.2.

In theory, the OWL complementOf property can also be used. However, this can readily lead to significant logic
problems since it is an 'open world' definition. To give an example, in order to determine the elements of the set of movies that
have not received bad reviews, one would have to collect all movie reviews ever published and note the ones that were not
bad. Since it is a critical design goal that a processor MUST be able to determine with certainty whether
a candidate resource is or is not an element of a Resource Set, the OWL complementOf property SHOULD NOT be used.

A combination of the exclude RDF properties described in sections 2 and 3 and OWL's unionOf operator can be used to
create precise, that is, closed world, Resource Set definitions that exclude particular resources. For example, at the end of
Section 1.2 we claimed that it is possible to define the set of "all resources on example.comexcept those on video.example.com shot in widescreen format." Example 4-6 shows how this can
be done in a relatively few lines.

Whilst Resource Set definitions like Example 4-7 are possible, their use will place a substantial burden on the processor and
SHOULD be avoided. The Resource Set it defines is the set of resources on example.org with a URI path starting with foo or ending
with either foo or bar, plus the resources on example.com that have a URI path starting with bar.

It is important to note that, when a set definition denotes resource by their address, we can obtain the same result by using
the includeUriPattern property, which would usually provide a more efficient solution. Example. 4-7 can be
rewritten as shown in Example. 4-8.

Example 4-8: A more efficient way of expressing the same Resource Set shown in Example 4-7

5 Logical Inconsistency

It is recognized that a number of the design goals and constraints set out in section 1.1 are in tension with
each other, notably that Resource Set definitions must be easy to write, be comprehensible by
humans and, as far as is possible, should avoid including or excluding resources unintentionally.

To answer the call to make it easy to write Resource Set definitions, a wide variety of RDF properties have been defined that are, it is hoped, easy to use and comprehend by humans. It is anticipated that Example 5-1 will be typical.

Example 5-1 A simple Resource Set definition anticipated as being typical

This is analogous to the sort of resource grouping in a robots.txt file [ROBOTS] that invites crawlers to probe all parts of a Web site except the cgi-bin, the testing and private areas.

Now suppose that the content provider responsible for example.mobi sets up a service called 'Test Your IQ.' realizing that the Resource Set
definition will exclude the testyouriq section of the Web site (as it begins with test), he/she adds a new line to the
Resource Set definition in an attempt specifically to include the new section thus:

This would not have the desired effect! The critical part of this definition now says that a candidate resource is a member of the Resource Set if it
has a path that begins with testyouriq AND does NOT have a path that begins with test. This can never be true and therefore
Example 5-2 is equivalent to the empty set.

This example serves to highlight an important point: that it is perfectly possible to create a set definition that includes logical inconsistencies. A POWDER processor MUST, indeed can only, treat such Resource Set definitions as the Empty Set.

The correct solution to the problem is not to specify a further property in the original Resource Set,
but to create an additional Resource Set definition and combine the two with an owl:unionOf operator thus:

6 Extension Mechanism

In this document we have laid out just two methods to define a set of resources: one referring to resource addresses and the other
to resource properties. The address-based methods are clearly designed to be used with information resources available on the Web that
can be identified by matching things like host names, paths and IP addresses. There is no limit on the distinguishing characteristics that
can be used to define a set of resources, however, and so there should not be unnecessary constraints on how the protocol works.

The POWDER Vocabulary [VOC] uses pre-defined data types from XML Schema as well as other atomic data types, and
then derives list data types from them. As the following examples show, an analogous approach can be taken with any system used for identifying
resources so that little augmentation would be needed for a POWDER processor to be able to handle the data.

Importantly, if a Resource Set is defined using any term that the processor does not recognize then it MUST treat it as the empty set.

6.1 Extension Example 1: ISAN

The International Standard Audiovisual Number [ISAN1] is a voluntary numbering system for the identification of audiovisual works. Following ISO 15706, the numbers are written as 24 bit hexadecimal digits in the following format [ISAN2].

-----root-----

episode

-version-

ISAN

1881-66C7-3420

-

0000

-7-

9F3A-0245

-U

The root of an ISAN number is assigned to a core work with the other numbers being used for things like episodes, different language versions, promotional trailers and so on.

A vocabulary can readily be defined to allow Resource Sets to be defined based on ISAN numbers. The terms might be along the lines of:

includeRoots — the value of which would be a white space separated of hexadecimal digits and hyphens that would be matched against the first three blocks in the ISAN number.

includeEpisodes — a white space separated list of hexadecimal digits and hyphens that would be matched against the 4th block of 4 digits in the ISAN number.

includeVersions — a white space separated list of hexadecimal digits and hyphens that would be matched against the 5th and 6th blocks of 4 digits in the ISAN number.

includeIsanPattern — a regular expression that should be matched against the entire ISAN number.

The set of all audio visual resources that relate to two particular works might then be defined as shown in Example 6-1.

6.2 Extension Example 2: Custom URL Patterns

Developers may create their own URL patterns for use in specific services. For example, Google Custom Search Engine [Google]
uses wildcards so that www.example.org/* means "all
the resources on www.example.org." Such a system is easily used within a Resource Set, only requiring the definition
of a single RDF property myPattern as shown below.