xml:base vs. RFC 3986 Grudge
Match

Abstract

The spirit of the xml:base
Recommendation and that of RFC 3986 are at odds with one another. The
solution? A very simple new member of the URI family of
specifications.

URIs can be useful to identify resources when working on the Web, but
are unwieldy and verbose. Some schemes of URIs are descendants of
hierarchical filesystem concepts. When using such URI schemes, users can
then form relative references which resolve to absolute URIs through a
natural path traversal concept. These relative references can usually be
brief and more easily manageable than the corresponding absolute URI.

The URI specification, RFC
3986, describes a mechanism for locating the URI to use when
resolving these relative references. This URI is called the base URI for a
given context. RFC 3986 also describes what should happen if a relative
reference resolves to the same URI as the base URI. In such a situation, the
reference is known as a "Same-Document Reference", and software that
processes such Same-Document References should refer to the current document
when resolving such references. As a result, base URIs should be used to
indicate the actual URI of the resource in question.

[A] person is deliberately abusing the base URI by assigning it an
unrelated URI for the purpose of creating an artificial shorthand notation
for external references.

--Roy T. Fielding

In effect, the contract that RFC 3986 lays down is that the base URI
provided to an application should be considered to be the URI of that
context. This is not unreasonable, because if you change the base URI in a
given context to that of an external resource, then any Same-Document
References in that context could refer to the current document and not the
external resource. We can see this problem through close examination of
the example from
section 3 of the xml:base Recommendation, which I reproduce here for
discussion.

Let's assume that the URI for this document is
http://example.org/today/. This use of xml:base tries to make it easy to refer to the URIs
http://example.org/hotpicks/pick1.xml,
http://example.org/hotpicks/pick2.xml, and
http://example.org/hotpicks/pick3.xml. If the xml:base on the olist
element were /hotpicks/pick2.xml instead of
/hotpicks/, however, then the relative reference in the
second item would become a Same-Document
Reference according to RFC 3986, and even though it resolves to
http://example.org/hotpicks/pick2.xml, which is a separate
resource from http://example.org/today/, processors should treat
http://example.org/hotpicks/pick2.xml as the current document. A
similar problem occurs if you change the relative reference in the second
item to #pick2 instead of pick2.xml.
Another good example and description of this problem can be found in Sjoerd
Visscher's How
to use base URIs. These two specifications differ in
their understanding of identity. Also, any other specification that wanted
to be able to use “an artificial shorthand notation for external
references” would come into conflict with RFC 3986.

So what do I propose as a solution to this problem? I believe that if
users want such a shorthand for quickly specifying external references, they
should call this shorthand something different. A different name would put
this technology outside the scope of the mechanisms built in to RFC 3986.
For example, I might call these shorthand references just that - "shorthand
references", and specify that they resolve against application-specified
"grounding URIs". There is no difference in the mechanism for resolution,
but we no longer assume anything about base URIs because we explicitly don't
use them.

As something of a compromise, even, we might allow shorthand
references to resolve against a grounding URI if one is present, and a base
URI otherwise. In such a way users have the option of specifying references
with either of the desired set of properties. To fix the specific problem
with xml:base and base URI abuse, I would
simply word the xml:base specification in such
a way that it uses shorthand references instead of relative references and
note that these shorthand references ought to be resolved against grounding
URIs as defined by the xml:base attributes. If
anyone is interested in working on such a specification (minimal as it may
be), I would be happy to draft one, otherwise I'll just use these concepts
internally in my own projects.