Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188 View this blog in Magazine View.

Wednesday, April 15, 2009

LSIDs, to proxy or not to proxy?

The existing TDWG recommendation that "5. All references to LSIDs within RDF documents should use the proxified form", basically states that LSID will never appear in any way other than bundled into an http URI - if we are also to publish data as RDF.

That sounds as if it means that those wanting to use LSID resolution will first have to extract the LSID part from the http URI which will now appear everywhere we would expect to find our unique identifier.

Donald [Hobern] has presented a strong case for unique identifiers conforming to the LSID specification but we have now an equally strong case that in its http form our identifier must behave as a dereferenceable URN per W3C linked data recommendations.

My own view is that the RDF should always contain a canonical, un-proxied version of an identifier (whether LSID or DOI), because:

having only the proxied version assumes that there is only one suitable proxy (there may be multiple ones)

it assumes that the specified proxy will always exist (our track record in durable HTTP services is poor)

assumes the specified proxy will always match conform to current standards

it imposes an overhead on clients that want the canonical identifier (i.e., they have to strip away the proxy)

I predict that for any meaningful, successful (read "actually used") identifier there will be multiple services that will be capable of consuming that identifier, not just HTTP proxies. DOIs can be proxied (by several servers, including http://dx.doi.org/ and http://hdl.handle.net ), resolved using OpenURL resolvers, etc.

In order to play ball with Linked Data, there are several ways forward:

always refer to LSIDs in their proxied form (see above for reasons why this might not be a good idea)

ensure that at least one proxy exists which can resolve LSIDs in a linked data friendly way (see bioGUID as an example)

Imagine, for example, a publisher such as PLoS or Magnolia Press (publisher of Zootaxa), both of which have recently published taxonomic papers containing LSIDs (e.g., doi:10.1371/journal.pone.0001787). They might want to display LSIDs linked to their own LSID resolver that embellishes the metadata with information they have (e.g., they might wish to highlight links to other content that they host). In a sense this is much the same idea as supported by OpenURL COinS, where OpenURL-format metadata is embedded in a HTML document and the user choose what resolver to use to resolve the links (including tools such as Zotero).

Having LSIDs prefixed with a HTTP proxy makes these task a little harder.