Erik Wilde on Services and APIs

Wednesday, July 08, 2009

The Last URI Scheme You'll Ever Need

there is a recurring pattern when it comes to the discussion of how to make information about classes of resources available on the web. web traditionalists often think of minting new URI schemes, because that was the idea of URI schemes: if there is a novel way of identifying (and probably interacting with) a class of resources, there should be a URI scheme for them.

this has changed since the semantic web has become popular, and even more so since the httpRange-14 issue has been, well, not exactly resolved, but at least addressed by some pattern of HTTP best practice called Cool URIs for the Semantic Web. while the original scope of httpRange-14 was HTTP (as indicated by its name), in many cases it now conveniently serves as a vehicle for establishing a simple discovery mechanism for descriptions about anything, not just HTTP resources. (i have been recently informed that this is the core idea of linked data, but i am still a bit unsure whether that's all there is to it.)

as a result, there is a quite a bit of resistance to the idea of new URI schemes. this was always the case, and it certainly is true that new URI schemes should be minted with care: deployment is slow (browsers typically lack good plug-in architectures for URI schemes), and thus, only big and universally relevant classes of resources should get their own URI schemes. on the other hand, there are useful examples for how long-neglected URI schemes suddenly become useful, such as the tel: scheme, which on an iPhone let's the phone simply dial the number when such a link is followed. this is useful and possible because of the explicit label that some identifier is a number as defined by the international telephone numbering system.

the reasoning for the semantic web line of thought is as follows: for well-known classes of resources, use some magic http: URI prefix instead of a new URI scheme. the advantage of this approach is usually explained by the fact that clients not knowing the special magic nature of the http: prefix can still access the resource and get a description about what it is (usually expected to be RDF), whereas clients knowing the magic prefix can behave in exactly the same way as they would for a new URI scheme.

from the semantic web point of view, this looks very convenient, because any semantic web client can safely live in an HTTP-only world and assume that descriptions can always be found via httpRange-14 (if they are available). but are there any disadvantages for the plain web, or any side effects that don't look great from the plain web architecture point of view? let's compare the concrete example of the tel: URI scheme:

in the plain web approach (which is the way it has been done), there is a tel: URI scheme (RFC 3966) which defines a new class of resources, in this case telephones, and an access method, the ability to call them. browsers not knowing this scheme will refuse to follow links (i.e., to resolve those URIs), but since they don't provide calling services, that's not too bad. links break on the web, and that's a feature, not a problem.

in the semantic web approach, there would be a magic http: prefix, for example http://itu.int/phone/, which would be the prefix for every telephone number URI. ITU would serve RDF via httpRange-14 to allow clients to get a description of the actual resource, and clients knowing the magic prefix could directly apply the same logic as in the plain web case and for example establish telephone calls.

in terms of behavior, both approaches look similar, with the semantic web approach having the slight advantage of the RDF allowing semantic web enabled clients to learn/reason about the resource in question. however, this last assumption only is true if the RDF uses some ontology that universally describes a telephone number in a way that can be understood by a generic semantic web client; if the RDF served by the ITU uses some ITU-specific ontology to describe the concept of a telephone number, then nothing is won, because a client would need to understand that ontology (which is the exact same assumption as requiring that clients understand a new URI scheme such as tel:).

more interestingly, this pattern produces a single point of responsibility and failure: somebody has to be the owner of the magic URI prefix, has to make sure that RDF descriptions are served, and there has to be some guarantee that this entity is stable enough that it makes sense to hardcode URI prefixes into clients. it seems to me that this centralization is quite a price to pay for the ability to serve semantic web clients with descriptions (which, as explained above, may not even be useful for all clients because they are not necessarily using globally understandable ontologies).

maybe i am just too conservative, but i still cannot see how this design is a good trade-off for the web as a whole. while URI schemes are openly published agreements on how to identify resources and how to interact with them, the semantic web httpRange-14 perspective tries to move this whole mechanism into an HTTP-based (and increasingly HTTP-only) world, but still cannot make sure that things will just work, because even on the semantic web, agreement always need buy-in into specific ontologies, and these are not necessarily universally accepted or even available. for example, is there some way how i could describe a telephone number so that the majority of semantic web applications would understand the concept and act appropriately?

finally, while the idea of the semantic web approach is seducingly simple, i am still waiting for a single relevant example where it was applied successfully. for small and isolated approaches, the idea works and in these cases, only few people would want to mint new URI schemes anyway. but for large and universal application scenarios, are there any examples where the magic URI prefix approach has been successfully applied? i would really appreciate any pointers! (well, maybe http://twitter.com/ morphs into such a magic prefix; are there any known clients which already implement special behavior when encountering such a URI?).