Microformats and such

I hope you’ll forgive this brief diversion from my ongoing attempt to distinguish web developers from web designers, but it’s late, I’ve had a couple beers and I’ve been tinkering a bit with some code. Regularly-scheduled programming will return shortly.

I’ve prepared a document which encapsulates one approach to solving this problem. This document could be from an application which keeps track of upcoming events and provides users with information about them, and as such it describes an event which will take place in the future.

Your mission, should you choose to accept it, is to write a program which:

Retrieves this document from the above-listed URL.

Parses the document to obtain the date and time on which the event will take place.

Prints out that date and time in the standard format of your choice.

I can see at least two ways to do this, and I’ve already written a short Python program which implements one of them. It’s eleven lines of code, five of which are import statements and one of which retrieves the document, leaving only five lines to parse the document, retrieve the date/time and print it.

If you’d like to try this yourself before I explain how it works, stop scrolling down the page now. Come back and read the next section when you’re ready.

How it works

As I’ve mentioned, I can see at least two ways to get the necessary information out of the document. Both rely on a little-used feature of HTML: the scheme attribute of the meta element. The HTML 4.01 specification defines scheme as follows:

This attribute names a scheme to be used to interpret the property’s value (see the section on profiles for details).

The scheme attribute allows authors to provide user agents more context for the correct interpretation of meta data. At times, such additional information may be critical, as when meta data may be specified in different formats. For example, an author might specify a date in the (ambiguous) format “10-9-97”; does this mean 9 October 1997 or 10 September 1997? The scheme attribute value “Month-Day-Year” would disambiguate this date value.

At other times, the scheme attribute may provide helpful but non-critical information to user agents.

It then goes on to give another example — using a value of “ISBN” for a meta element representing the ISBN of a book — as an illustration of the versatility of the scheme element, and delegates responsibility for defining schemes and their meanings to specific profiles used in HTML documents.

And if you look in my sample event-description document, you’ll find a couple of meta elements making use of this:

The first of these is a fairly standard use of meta; generally, “date” refers to the date and time on which the document was authored. The second has a name which I chose largely at random, but which can be assumed to stand in for a value which could be specified by, say, a microformat.

Both of them make use of the scheme attribute, and specify a value of “RFC3339”. RFC 3339 is document which defines a standard for representing timestamps on the Internet, based on the (broader in scope) ISO 8601 standard.

This suggests one way to solve the challenge above: look for a meta element with the name “event-datetime”, see that its scheme value is “RFC3339” and treat its content as an RFC 3339 timestamp. From there, any decent programming language will give you the necessary tools to parse the string “2008-07-06T12:00:00-05:00” into an object representing a date and time, and then reformat it however you like.

An alternative, and perhaps more interesting, way to handle this is to note that the document also contains this bit of HTML:

<span class="event-datetime">12:00 Sunday</span>

This is the “human-readable” version of the event date. Parsing this one is a bit trickier, but is still possible: you know, from the meta elements above, the date and time when the document was authored, and thus have a reliable anchor from which to calculate the relative date and time of “12:00 Sunday”. It’s not quite as simple to do this as to simply use the “event-datetime” meta element, but hey, if Remember the Milk can figure out what I mean from nothing more than the word “Sunday”, then a programmer armed with a day of the week, a time and a full base timestamp with time zone should be able to work this one out.

A starting point

Of course, this isn’t a perfect solution, or anything approaching it; it there are plenty of unanswered questions and unsolved problems lurking here (how do you handle multiple timestamps in the same document, for example?), and it probably isn’t a new idea. But it is the beginning of a possible solution, and it has some advantages over, say, using abbr or title to provide a “machine readable” version of a timestamp:

It doesn’t fuck with screen readers.

It uses a feature of HTML in a manner consistent with that feature’s documented purpose.

In simple cases, it’s really easy to deal with.

It opens up a way to gradually merge the ideas of “human readable” and “machine readable” timestamps, because those don’t necessarily have to be different things (without forcing people to learn the machine format).

There are probably lots of potential solutions which can do this with the same advantages; this WaSP article mentions a few. If you’ve got an idea for another, run it up the flagpole and see if anyone salutes.