To benefit from and increase the value of the World Wide Web, agents should provide URIs as identifiers for resources.

But existing Web technologies and specifications offer a wide range of
options to anyone embarking on a project which involves the creation and
management of Web-enabled names. As a result, for anyone attempting to follow the above advice, a question
immediately arises, namely "What
kind of URIs should agents provide?"

This document is intended to supplement AWWW by
addressing this question.

Following the precedent of AWWW, we proceed in terms of a
simple scenario which supplies motivating context for a subsequent more detailed exploration of
requirements for Web-enabled names and best practices for addressing those requirements.

Nadia and Dirk's work on film industry data (see AWWW section 4.2.2) has been successful up to a point, but their employers, a consortium of film studios called FSC, are not happy that they have used URIs from a widely-used public film database to identify films, actors, directors, etc. "There's no guarantee those URLs will always give the right information" says Robin, the project officer at FSC, "and anyway, FSC should control all of this. Also, we want names, not old-fashioned locations. Isn't there a way we can have a bunch of names that are clearly ours, and guarantee what they identify?"

At this point Dirk and Nadia both reply at once:

Dirk:
"Sure, yes, there's
Uniform Resource Names, just what you want, we could register a new URN
namespace, say, fii, for Film Industry Identifier. Then we would
have for example urn:fii:studio:megapictures."

Nadia:
"Sure, yes, there's always room for another URI scheme,
say, fii, for Film Industry Identifier. Then we would have for
example fii://studio/megapictures/."

"Well, sounds like there are some choices to make. You two get your
story straight, and bring me back a proposal with a rationale and some costings", says Robin.

So, who's right? What should the consensus proposal look like? Not
surprisingly, that depends on the requirements, which need to be articulated
in a lot more detail than is provided by Robin's off-the-cuff remarks. In what follows
we'll explore the requirements space and the solution space, and conclude that
in a large number of cases both Dirk and Nadia are on the wrong track, because
http-scheme URIs can satisfy Robin's requirements while looking
very attractive from a cost-benefit perspective.

What Robin says FSC wants for its web-enabled names is very close
to what many groups and projects have specified as requirements in similar
situations: they all want names which are identifiable, reliable and
stable. The technology behind the names should allow delegation of naming authority and provide uniform access to metadata.

It's important to be clear what each of these requirements really means,
so let's start with a more detailed list, making some key
distinctions along the way, and reflecting a common rough order of importance:

Dirk says "People must be able to tell by looking at
one of our URIs that it is one of ours". Nadia adds "Yes, and
browsers too: our URIs should be
syntactically distinguishable from all other URIs".

Hereafter we'll use the
phrase FSC URIs to refer to the set of URIs identifiable as names
of FSC resources, and the phrase access an FSC URI as shorthand for the
retrieval of a representation of the resource identified by such a URI.
Although the requirements in this section, and the analysis in the next, are
expressed in terms of FSC URIs, they are intended to be relevant
to any organized effort to use Web-enabled URIs as names, whether managed by an
individual, and organization or a group.

"FSC URIs should work right away for everyone", says
Nadia. "Right", say Dirk, "in web pages, email, all the places people expect
to be able to put URIs and then click on them". That is, it should require
little or no effort on the part of ordinary users to access an FSC URI.

Nadia says "It's very important that
our users never see a 'not found' response (as long as they're actually online)". That is,
it should always be possible to get a positive response
(either a representation or other definite advice about the resource)
from an attempt to
access an FSC URI.

Dirk and Nadia have read the TAG finding on
opacity of URIs, and are agreed that FSC URIs should be explicitly
transparent, that is, it should be evident what each of their URIs is about just
by looking at it. They understand and accept that this means they will have to
document the nature of the mapping from URIs to resources as part of their public site description.

Dirk says "We have to be able to give our members
control of the URIs that concern them". Nadia says "Right, for instance, each
studio has to be responsible for their own pictures". That is, the owner of
a set of FSC URIs must be able to delegate support
for the transfer of naming authority (control over the meaning of URIs)
for some designated subset of FSC URIs.

Nadia says "People have to be able to rely on our
URIs". Dirk adds "Yes, the meaning of our URIs shouldn't change, no matter
what happens".
After further discussion, they distinguish three kinds of stability:

owner stability
"No-one can take our URIs away from us."
That is, ownership of a URI, and the authority over a URI's meaning which
follows from it, continues as long as the owner wants it to;

resource stability
"For at least some of our URIs, we won't
ever change what they are for". That is, in AWWW terms, what resource such a URI identifies shouldn't change;

representation stability
"For at least some of our URIs, we want to
guarantee that exactly the same page will always come up". That
is, the representation
retrieved from such URIs shouldn't change.

"How do we find out the status of
one of our URIs", asks Dirk, and goes on "Who its naming authority is, who wrote its
representation(s), whether it's meant to be stable or not—oh, lots of stuff"?
"There should be a standard place to put metadata, and a standard way to get at
it, for every one of our URIs", replies Nadia. That is, given an FSC URI it should be possible to retrieve metadata about
the URI and the resource it identifies independently of the representation
of that resource, if any.

Nadia remembers security: "If we turn out to want to hold
or exchange information
which is private to some of our members or users, they have to know it's safe."
"Right," says Dirk, "We should be able to reliably identify our users and
members, and keep our interchanges with identified parties private, without
otherwise changing things."

Do we need this? Is it too obviously a floater for
https: to bang out of the park?

Half of this example is in detail false: W3C is not actually
commited to repr stab for dated URIs, only that to using dated URIs for
non-time-dependent-resources. Do we have a better example in the public domain?

If the distinction between resource stability and representation stability isn't clear, consider the difference between http://www.w3.org/ and http://www.w3.org/TR/1998/REC-xml-19980210. The each identify a resource, the W3C home page and the first edition of the XML language specification, respectively. The W3C observes the maxim "Cool URIs don't change", which is to say, it is commited to resource stability across the board. The specific consequence of that for the examples at hand is that the W3C is commited to maintain the use of those two URIs to identify those two resources, in perpetuity. But the W3C is only commited to representation stability for the second of these two URIs. Indeed a significant contribution to the value of the first URI, which identifies the W3C home page, is that the representations which can be retrieved from it are not stable: they change on a regular basis, to provide up-to-date information about W3C activities. On the other hand, representation stability is important for the date-containing-URI names of W3C specifications: the W3C is commited to always provide exactly that original representation of the XML specification.

Still not great -- what I really want is something such as a product
key URI. . .

Should I say something to the effect that people are often (always?)
mistaken when they think they want this?

So, let's try a rewrite: If the distinction between resource stability and representation stability isn't clear, consider the difference between http://www.w3.org/ and http://web.archive.org/web/19990503021459/http://www13.w3.org/. The each identify a resource, the W3C home page as such and the W3C home page as it was at a particular date and time, respectively. The W3C observes the maxim "Cool URIs don't change", which is to say, it is commited to resource stability across the board. The specific consequence of that for the first example is that the W3C is commited to maintain the use of that URI to identify the resource which is its home page in perpetuity. But the W3C is not commited to representation stability. Indeed a significant contribution to the value of that URI is that the representations which can be retrieved from it are not stable: they change on a regular basis, to provide up-to-date information about W3C activities. On the other hand the value of the second URI depends on representation stability: that is, retrieving from that URI will always give you the same representation the Wayback machine retrieved from the W3C site on May 3rd 1999.

After looking into their requirements more carefully, as tabulated
above, and investigating the space of possible solutions more thoroughly, Dirk
and Nadia come to the conclusion that a carefully designed approach using
http: URIs can come as close if not closer to satisfying all their
requirements than either of their original suggestions, or any other
non-http: scheme.

http: URIs aren't perfect, and in some cases there are
trade-offs: fully satisfying one requirement may require some compromises with
respect to another: The Challenges and
tradeoffs section below introduces some of these, but a complete
analysis is beyond the scope of this finding.

The desire for branding to be evident in URIs is both widespread and
understandable. URI identifiability is a form of advertising, where the
admittedly modest impact of a single use of an identifiable URI is potentially
magnified greatly by widespread replication. Identifiability is also a
cornerstone of trust: brand recognition and successful URI access are mutually reinforcing.

RFC 3986 [ref], the standard which governs all URIs,
provides for both a registry-based
authority segment and a local, typically hierarchical, path segment
in URIs, and recommends both, together with the use of the IANA Domain Name system [ref IANA] for
the authority segment, for any URI scheme that intends to be global in scope.
http: URIs do exactly that, and so clearly follow the RFC
recommendation and thus satisfy the identifiability requirement, since all the
participants in any joint approach to minting Web-enabled names can be assumed to already have
domain names which are readily identifiable, or can come to be readily
identifiable, as theirs.

Pervasive support for http: URIs is the foundation of the
success of the Web today. A wide range of user agents, not only web browsers,
recognize http: URIs and know how to access them using widely
deployed software support for the DNS and HTTP protocols.

At the other end, again a wide range of server software is available,
both free and commercial, ranging from fully-integrated website and document
management systems with support for on-the-fly synthesis of documents to simple
lightweight filesystem-backed servers.

"Nothing ubicts like ubiquity" :-)

With the exception of the legacy ftp: scheme and the
non-Web file: scheme, no other URI scheme has anything like this
degree of ubiquity.

Existing Web infrastructure uses the HTTP protocol [ref
2616] to access URIs. The long-term useability of http: URIs does
not however depend on HTTP: if new protocols arise which
complement or replace HTTP, updating or augmenting the Web infrastructure to use
them for accessing http: URIs will certainly follow.

Why is this hard? Dirk and Nadia's requirements all seem sensible, and
we've had names for use on the Web for nearly 20 years now. What stands in the way of a naming scheme satisfying those requirements?

The stability of name ownership is at risk for at least two pretty much
incorrigible reasons: The instability of human institutions, and the contingent nature of name registration.

It is in the very nature of owning anything that the first
kind of risk inheres: the owner of a name may
sell it, or give it away, or lease it, or indeed go out of business, or sell
their business, or the relevant division. Any of these changes amount to or result in a
change of ownership.

The second kind of risk arises from the nature of names on the Web. Virtually all naming schemes used on the Web are based on a division of
names into a global part, managed by a global registry, and a local part,
typically involving some form of hierarchical decomposition. The syntax of
URIs is designed to support this decomposition. The RFC
which governs URIs [ref 3986] distinguishes between the authority segment
(registry-based) and the path segment (local, hierarchical) of a URI, and recommends the use of the IANA Domain
Name system [ref IANA] for the authority segment of any URI scheme that intends to be global in scope. Any
naming scheme that follows this recommendation, and thus equates URI ownership to Domain Name ownership,
such as the http: URI scheme, depends on the stability of ownership of
Domain Names for URI owner stability. But Domain Names are not really
owned, only leased, for fixed terms, with no guarantee of
renewability, with the possibility of expropriation and with the in-principle risk, however unlikely in practice, that the Domain Name
registration system itself may cease to function.

There is ultimately no way around this. In particular, there is no
point in proposing naming schemes that use their own registries and/or lookup mechanisms (not
involving IANA) solely in order to get around this, because the reasons IANA operates
Domain Name registration the way they do, and the vulnerabilities that the
Domain Name system has, are universal and inescapable, given the requirements
it must satisfy. See the appendix below for a discussion of why this is the case.

The owner of a URI has the right to determine what it means, that is,
what resource it identifies, and the responsibility to respond to requests to
access representations thereof. It follows that any change of ownership may (but need not)
mean a change here too. And even without a change in ownership, control over
resource identity and/or responsibility for reliability may change if the owner
delegates that control or responsibility, or
changes an existing delegation.

The most common threat to both reliability and resource stability in a
global plus local naming system is the single point of failure implicit in
registry-based ownership. The technical aspect of this isn't the problem: multiple servers, aliasing, failover, etc. are all
well-understood, widely and successfully deployed techniques. Rather it's the management aspect: Even if no real or effective change in
ownership occurs, once again it's the
frailty of human institutions that is the problem: a change in business focus,
or loss of interest in the relevant aspect of their business,
or just misconfiguration of a
DNS entry or server, may compromise reliability or resource stability. That is, at worst, the (new)
owner of some names in a scheme will stop
responding to requests for what they see as old and irrelevant URIs (a failure
of reliability), or, worse, will decide to re-use those 'old' URIs for
different resources (a failure of resource stability). Users of the
URIs will no longer be able to access such URIs in the most obvious way or will
not get what they expect when they do.

Somewhat perversely, the main challenge here is that it's actually
rarely if ever really what is wanted -- to tie a URI to a particular character
sequence to be interpreted as a particular media type is a very
strong constraint.

And if it really is what is wanted, an externally verifiable guarantee
is probably wanted as well, which in turn at least compromises transparency,
because it means that the URI for a representationally stable resource will
have to include both the intended media type and a hash or checksum of the intended
character sequence, as for example has become common practice among
peer-to-peer sharing of Anime [ref http://animechecker.sourceforge.net/].

The ultimate in delegation is a fully decentralised system, in which
anyone can mint FSC URIs. The minimum necessary to avoid collisions
is the use of a central registry such as the
Domain Name system for the authority part of the scheme. The challenge here of
course is that there is no place for any structure to ensure that minters of
scheme URIs respect whatever constraints the scheme owners have specified to
guarantee that other requirements on the scheme are satisfied.

Furthermore, the more entities actually mint scheme URIs, the more
likely it is that one of them will undergo one of status changes mentioned
above under Challenges to owner stability.

So the fundamental challenge is to find the right point on the continuum
from fully centralised to fully decentralised which delivers on all the other requirements.

The desire for branding to be evident in URIs is both widespread and
understandable. URI identifiability is a form of advertising, where the
admittedly modest impact of a single use of an identifiable URI is potentially
magnified greatly by widespread replication.

Identifiability seems to follow naturally from delegation at the highest
level: if different entities are free to mint FSC URIs, and Domain
Names have a place in them, then identifiability is provided. But the
previous section suggests that a fully decentralised approach is unlikely to
satisfy other requirements, so a place for identifiability in a less than fully
decentralised approach has to be found.

Many of the requirements listed above are not essentially technical in
nature. Rather they are social. That is, they impose conditions on the
management of the names, not their essential nature. We'll start
by looking at name management policy, then move on to specific mechanisms which can be deployed
to assist in name management, or to some extent protect against potential
breakdowns of name management policy.

As things stand all that anyone, or any group, can do is to put
carefully-designed mechanisms in place to ensure that all Domain Name
registrations are legitimate (that is, not vulnerable to expropriation for
cause), monitored for impending expiry, and renewed in a timely fashion.

Providing persistence and resource stability

Assuming owner stability, good will, and continuing commitment to
participation, these requirements are entirely in the hands of
the originators, operators, and participants in any approach to Web-enabled naming. They are
nonetheless among the hardest to address well. Restricting naming authority to
trusted participants whose corporate self-interest is evidently tied up with
their commitment to maintain their web presence and not change what their URIs
in the scheme mean is an obvious starting point, but deciding just how
commitments are to be phrased and what sanctions, if any, are to be available
to enforce those commitments is inevitably a difficult business.

Some degree of protection against this kind of failure of the scheme
can be provided by delegation and/or replication, see below.

The solution is partly management, i.e. the imposition of participation
requirements, and partly
technical, e.g. include the representation's checksum in the URI itself, or provide the checksum in a metadata record... it's not a guarantee
(nothing is) but it at least provides a way to check, and if a
mismatch is detected the site operator can be shamed into fixing it
(same as for checksum-in-the-URI). OBO [ref?] does this.

This section started out trying to be a framework within which all
existing schemes could be characterised. Perhaps that's the wrong goal. . .

In the simplest, and very common, situation with respect to URIs,
resources and representations, the owner of a Domain Name is the
implicit proprietor of a naming scheme, consisting of all URIs which use that
Domain Name (and its sub-domains) as their authority. The owner decides what
resources will be given names in that scheme, what those names will look like,
how representations will be stored and/or computed and provides the necessary
computing resources for storage, computation and servicing of access requests.

Once more than one party is involved, as is the case in the Dirk and
Nadia FII scenario we are considering, choices arise
with respect to each of the decisions and provisions just listed, and these
choices in turn have implications for the requirements placed on the scheme.
In what follows we consider what choices affect the stability and persistence requirements.

Name ownership

I'm not sure about using 'Domain Name' below—I keep going back
and forth between thinking about this document as if it's meant to cover _all_
URIs, or just URIs which follow the RFC advice and use Domain Names in their
'authority' part, or just http: URIs. . .

This section contains bits and pieces, some from from other sources,
which may or may not find a place in the final document.

Once any of those decisions or provisions are placed in hands other
than the owner's, we have an instance of delegation. Almost any combination of
retention of some aspects control/provision and delegation of the rest is
possible in principle—in practice we observe a small number of common
patterns, which we will explore below.

Some patterns of delegation can go a long way towards mitigating the
negative impact of institutional frailty on naming schemes. There are two
primary delegation patterns we will look at: centralisation (or delegation upwards, from the members of a group
to the group itself) and replication (or delegation downwards, from a group to
its members).

The framework is theoretical, which it has to be in order to catch
all the objections that are going to be thrown at it. The general
reader (scheme designer) won't be able to apply it without help, so it
will have to be augmented (as the existing draft finding was) with
examples.

Expropriation on appeal and repossession are not necessarily
the issues. One could assert that effective
permanence happens sometimes in practice (e.g. scheme names, chemical
element names) and could happen more often if we just figured out how,
so this document has to answer:

jar asks
When I create a new URI scheme or URN scheme and it gets accepted
and published by IETF, then the scheme name binding is permanent,
isn't it? No one has to renew the scheme name every year. So in what
sense is domain name registration renewal "inescapable"?

hst replies
Interesting example, but I think it
doesn't work. The IETF has rebound the http: scheme at least twice
since its original binding. There is no guarantee that it won't change again.
[RFC 4395] defines the scheme registration process, and the IANA scheme registry tabulates the registered schemes. [RFC 4395] makes explicit provision for bindings to be changed, and defines the process whereby such changes are made. Insofar, then, as your question means "Is the binding of a URI scheme name to an RFC which defines it permanent", the answer is clearly "no". And note that at a deeper level, the IETF is free to change the whole story, by issuing a new RFC which obsoletes (their term) [RFC 4395]. Indeed [RFC 4395] obsoletes a predecessor which told a different story.

jar askstag: URIs [ref?] are owned by their creator permanently—even if the
host or email address is recycled. This is accomplished by putting
timestamps in the URIs, much as publisher names are bound to
particular corporations by affixing the year of publication in
scholarly citations. Why can't we do the same thing for fii—maybe
even use tag: URIs with some special protocol?

hst replies
I think it's significant that
tag: URIs have no resolution mechanism. If they did,
wouldn't there have to be an appeals process, which would operate in cases
where one of the must nots of section 2.2 is alleged to have been violated?

hst muses a bit
The thing about the periodic table, or names
of craters on the moon, or of celestial objects, is that they all, to a pretty fair degree,
are grounded in objective (off-the-web, out-in-the-world) reality. The naming authority is centralised. And
indeed in a very real sense the names are all owned by the naming
authority. And I'd be surprised if there weren't counter-examples to the 'no
reassignment' principle at the
margins. . .

(JAR's naming scheme: some subspace of tag: URIs resolvable through
some cockamamie protocol... now convince JAR that he should use http:
URIs instead.)

However, I agree that the XRI folks are not putting immortality out as
a major issue for them, and my constituency may not be important
enough to spend column-inches on right now.

There are evidently stable, managed sets of names in existence: the
periodic table, the names of surface features of planets and satellites. What
is it about names for use on the Web that precludes true stability for them?
The combination of arbitrary, dereferencable and identifiable seems to be the
source of the problem. These three together means there is real value in
owning a name, and that there can, and therefore will, be dispute
about what legal entity gets to use what name. This in turn requires a dispute
resolution procedure for registered names, which in turn means expropriation must be possible.
Because supporting dereferencing requires resources, owning names incurs costs,
which means they will be abandoned, which in turn, along with the fact that
name ownership has real value, means that it makes sense to lease, rather than
sell, registration.

If we look at existing systems on the Web, that is URN namespaces and URI schemes, which do not rely (entirely) on IANA Domain
Names, we find broadly speaking three cases:

Some URI schemes, including doi (not
registered), and URN namespaces (such as uuid) use opaque strings,
typically numbers, either self-allocated (uuid) or via registries (doi).
Such approaches may involve outright ownership (uuid), or may not (doi, at
least from some registrars), and since they don't provide identifiability, need not provide for
expropriation, but they are none-the-less heir to the other vulnerabilities of
owned names.

Some schemes, including the info
and xri (not registered)
URI schemes, provide identifiability and operate their own registries, distinct
from the IANA Domain Name registry (although their current lookup mechanisms
do rely on the DNS system). The xri registry is
parallel to IANA's in all the aspects relevant to lack of stability discussed
above. The info registry is qualitatively quite different, as it
is restricted to names for the operators of large public
namespaces, and is clearly intended to operate in terms of dozens or at most
hundreds of registrations. No appeal or expropriation mechanisms are defined
for it, and since dereferencing is explicitly not required to be
supported, the impact of a registered info name owner going out of
business is not necessarily very great.

Some schemes, including the
tag URI scheme and the NEWSML URN namespace, combine
a Domain Name with a date, in an attempt to avoid the majority of the
vulnerabilities we've identified. However, tag URIs explicitly do
not support resolution, and NEWSML URN resolution is
left unspecified in principle, and in practice seems not to be supported.

Still need to say something about how the meaning of e.g.
http://www.w3.org/1999/xhtml might be stable even if W3C loses
ownership of w3.org. . .

In summary, a number of schemes exist whose vulnerability to the
challenges to ownership stability identified above is reduced, but they all achieve this at the expense of one or more of the Dirk and Nadia's other requirements.

Delegation as centralisation

Or, "Put all your eggs in one basket, and watch that
basket!" In its simplest form, centralisation means that all the
participants in a common naming scheme agree that
there will be only one repository of representations, and one domain name used
for names. This is technically very simple, but has several major
constraints which make it less likely to be a satisfactory solution:

The owner of the domain name is still a single point of failure. If
it is a member of the community, complex contractual constraints will be needed
to reassure the rest of the community. If it's an entity created for the
purpose, ditto, and governance of that entity needs to be
negotiated as well.

The cost of maintaining the representation repository falls on one
entity, at least in the first instance.

In the administratively simplest approach, control over naming is
centralized too, and this is likely to mean loss of branding. That is, the
identity of the
creator of a representation is not necessarily obvious from its name.

It should be clear that it is straightforward to use http:
URIs to implement this variant of delegation.

Delegation as replication

Or, "Split up, one of us is bound to survive!"
Technical replication at the level of DNS cannot solve the problem
of domain name loss. Only methods that involve a second
domain name can do that.

In the same way that a web cache or proxy server provides an
alternative to a standard DNS lookup plus hierarchy,
a naming system may specify an arbitrary alternative algorithm for
looking up its URIs.
Such an algorithm may be invoked as an alternative to other methods
that might contain unreliable steps.

This seems too narrow to me (JR); I can think of a
zillion variants
(eg).
Concrete is good, but maybe this is too concrete. Nb what's
described here is effectively an alternative DNS...

Delegation here means two lookups plus hierarchy. The first
lookup gets you to a naming-system-specific naming authority, whose
name is known ahead of time to clients of the naming system.
This authority itself
implements the second lookup, which leads to the repository where the
hierarchical part can be interpreted.

The good news is that most of the drawbacks of the centralised approach
have been removed:

No more single point of failure: by design, multiple entities
owning multiple domains provide replicated second-level name lookup.

(Second-level) name authorities are back to providing for the storage/serving
of their own representations.

The hierarchical part of naming is in the hands of (second-level)
name authorities, and as long as Domain Names are used as second-level names,
branding is retained.

There are downsides of this approach too:

Replication of second-level name information across first-level name
authorities requires real work

Policies governing the circumstances under which control of
second-level names is taken away from their (original) authorities will be hard to
formulate and get agreement on.

Some replication of representations is necessary to support
persistence in case second-level name authorities abdicate responsibility—it's not
clear where responsibility for that will lie.

It might be thought to violate
a WebArch
Best Practice: "A URI owner [naming authority] SHOULD NOT associate arbitrarily different URIs with the same resource." But the only difference between the multiple URIs that any resource named in this kind of scheme is the first-level name, and so the URIs are not "arbitrarily different".

The ARK
naming system is a good example of a naming system along these lines using http: URIs.

Hybrid delegation: centralized naming, distributed storage

It should be clear that there is no necessary association between
centralisation of naming and centralisation of storage: a middle way that
centralises naming, but leaves storage in the hands of content
owners,
is clearly possible. Doing things this way could also accommodate preservation
of branding.