US Library of Congress makes a step towards PRESTO

Very pleasing to see that the US Library of Congress Thomas project is making user-friendly, structured URLs available as permanent aliases for its legislation.

So instead of URLs like

http://thomas.loc.gov/cgi-bin/query/z?c104:h.r.1234:

people can use URLs like

http://hdl.loc.gov/loc.uscongress/legislation.110sconres33

which brings up a page with links to all the different information available about that piece of legislation: text, sponsors, costings, metadata, etc. Try it to see!

The advantage is that these names are formed using simple rules (congress number: 110, bill type: congressional resolution, number: 33) so you can figure out the URL if you know this information: you don't need to search for it, and it won't go out of date.

There has been a good grassroots move to require this: the Open House project, for example. Many European legislatures have also moved towards similar approaches.

I have been pushing a similar approach, but taking it further, in the PRESTO approach.

How does the Thomas project correspond to the PRESTO approach? Big ticks for having clear and hackable names, and for shielding the underlying implementation (it is just done in the resolver). A big tick for having names that apply to information regardless of whether it is available. A big tick for being permanent. A big tick for having a single resource that is the hub/index for information about its subresources.

Under the PRESTO approach, the next step would be to then make each of these subresources available using permanent PRESTO URIs. In the current implementation, you can get to the top-level resource, but then the linked resources are back to using impenetrable queries for the URL.

For example, at the moment if you go in your browser to http://hdl.loc.gov/loc.uscongress/legislation.110sconres33 you can click on a link Text of Ligislation which is the obscure URL http://thomas.loc.gov/cgi-bin/query/z?c110:S.254:. The PRESTO approach would be to have a permanent alias for the resource, a subresource http://hdl.loc.gov/loc.uscongress/legislation.110sconres33/text

In turn, this page gives several versions of the text. For example, the link To award posthumously a Congressional gold medal to Constantino Brumidi. (Introduced in Senate) has the URL http://thomas.loc.gov/cgi-bin/query/D?c110:1:./temp/~c110c8dIgu::. The PRESTO approach would be to have a permanent alias for the resource, a subresource http://hdl.loc.gov/loc.uscongress/legislation.110sconres33/text/senate-version

In turn, this page gives you an HTML version, but it also gives you different rendings possible which also have obscure URLs. For example, to get the PDF version, the PRESTO approach would be to make this a subresource. In this case, we use the ; syntax suggested by Tim Berners-Lee as matrix URIs, rather than query parameters. The PRESTO approach would be http://hdl.loc.gov/loc.uscongress/legislation.110sconres33/text/senate-version;format:PDF

The reason for using the ; form rather than the explicit query syntax, is to that we can have sub resources. For example, the pages containing section 3 rendered as a JPEG might be http://hdl.loc.gov/loc.uscongress/legislation.110sconres33/text/senate-version;format:PDF/section3;format:JPEG

Now, of course, behind the scenes, these URLs might be implemented merely by a smart web resolver on your webserver rewriting the incoming URL to some system-dependent, impenetrable URL using queries or chained services. Or if the resource had no representation, it could be a 203 Partial Information, or a 404 Not Found or even a 501 Not Implemented I suppose.

The other change to really follow PRESTO would be to actually do a model of how users think about the information they are retrieving: a use case. So if the user is a casual one, then having alternative PRESTO URLs (more than one permanent URI!) such as http://hdl.loc.gov/loc.uscongress/common-name/no-child-left-behind-act or http://hdl.loc.gov/loc.uscongress/common-name/patriot-act