“Distributed,”“Extensibility,”&Other Fancy Words

Diving In

There are over 100 elements in HTML5. Some are purely semantic, others are just containers for scripted APIs. Throughout the history of HTML, standards wonks have argued about which elements should be included in the language. Should HTML include a <figure> element? A <person> element? How about a <rant> element? Decisions are made, specs are written, authors author, implementors implement, and the web lurches ever forward.

Of course, HTML can’t please everyone. No standard can. Some ideas don’t make the cut. For example, there is no <person> element in HTML5. (There’s no <rant> element either!) There’s nothing stopping you from including a <person> element in a web page, but it won’t validate, it won’t work consistently across browsers, and it might conflict with future HTML specs if we want to add it later.

Right, so if making up your own elements isn’t the answer, what’s a semantically inclined web author to do? There have been attempts to extend previous versions of HTML. The most popular method is microformats, which uses the class and rel attributes in HTML. Another option is RDFa, which was originally designed to be used in XHTML but has been ported to HTML as well.

Microformats and RDFa each have their strengths and weaknesses. They take radically different approaches towards the same goal: extending web pages with additional semantics that are not part of the core HTML language. I don’t intend to turn this chapter into a format flamewar. (That would definitely require a <rant> element!) Instead, I want to focus on a third option developed using lessons learned from microformats and RDFa, and designed to be integrated into HTML5 itself: microdata.

What is Microdata?

Each word in the following sentence is important, so pay attention.

Professor Markup Says

Now what does that mean? Let’s start from the end and work backwards. Microdata centers around custom vocabularies. Think of “the set of all HTML5 elements” as one vocabulary. This vocabulary includes elements to represent a section or an article, but it doesn’t include elements to represent a person or an event. If you want to represent a person on a web page, you’ll need to define your own vocabulary. Microdata lets you do this. Anyone can define a microdata vocabulary and start embedding custom properties in their own web pages.

The next thing to know about microdata is that it works with name/value pairs. Every microdata vocabulary defines a set of named properties. For example, a Person vocabulary could define properties like name and photo. To include a specific microdata property on your web page, you provide the property name in a specific place. Depending on where you declare the property name, microdata has rules about how to extract the property value. (More on this in the next section.)

Along with named properties, microdata relies heavily on the concept of “scoping.” The simplest way to think of microdata scoping is to think about the natural parent-child relationship of elements in the DOM. The <html> element usually contains two children, <head> and <body>. The <body> element usually contains multiple children, each of which may have child elements of their own. For example, your page might include an <h1> element within an <hgroup> element within a <header> element within the <body> element. A data table might contain <td> within <tr> within <table> (within <body>). Microdata re-uses the hierarchical structure of the DOM itself to provide a way to say “all the properties within this element are taken from this vocabulary.” This allows you to use more than one microdata vocabulary on the same page. You can even nest microdata vocabularies within other vocabularies, all by re-using the natural structure of the DOM. (I’ll show multiple examples of nested vocabularies throughout this chapter.)

Now, I’ve already touched on the DOM, but let me elaborate on that. Microdata is about applying additional semantics to data that’s already visible on your web page. Microdata is not designed to be a standalone data format. It’s a complement to HTML. As you’ll see in the next section, microdata works best when you’re already using HTML correctly, but the HTML vocabulary isn’t quite expressive enough. Microdata is great for fine-tuning the semantics of data that’s already in the DOM. If the data you’re semanti-fying isn’t in the DOM, you should step back and re-evaluate whether microdata is the right solution.

Does this sentence make more sense now? “Microdata annotates the DOM with scoped name/value pairs from custom vocabularies.” I hope so. Let’s see it in action.

The Microdata Data Model

Defining your own microdata vocabulary is easy. First, you need a namespace, which is just a URL. The namespace URL could actually point to a working web page, although that’s not strictly required. Let’s say I want to create a microdata vocabulary that describes a person. If I own the data-vocabulary.org domain, I’ll use the URLhttp://data-vocabulary.org/Person as the namespace for my microdata vocabulary. That’s an easy way to create a globally unique identifier: pick a URL on a domain that you control.

In this vocabulary, I need to define some named properties. Let’s start with three basic properties:

name (your full name)

photo (a link to a picture of you)

url (a link to a site associated with you, like a weblog or a Google profile)

Some of these properties are URLs, others are plain text. Each of them lends itself to a natural form of markup, even before you start thinking about microdata or vocabularies or whatnot. Imagine that you have a profile page or an “about” page. Your name is probably marked up as a heading, like an <h1> element. Your photo is probably an <img> element, since you want people to see it. And any URLs associated your profile are probably already marked up as hyperlinks, because you want people to be able to click them. For the sake of discussion, let’s say your entire profile is also wrapped in a <section> element to separate it from the rest of the page content. Thus:

Microdata’s data model is name/value pairs. A microdata property name (like name or photo or url in this example) is always declared on an HTML element. The corresponding property value is then taken from the element’s DOM. For most HTML elements, the property value is simply the text content of the element. But there are a handful of exceptions.

Where Do Microdata Property Values Come From?

Element

Value

<meta>

content attribute

<audio>

<embed>

<iframe>

<img>

<source>

<video>

src attribute

<a>

<area>

<link>

href attribute

<object>

data attribute

<time>

datetime attribute

all other elements

text content

“Adding microdata” to your page is a matter of adding a few attributes to the HTML elements you already have. The first thing you always do is declare which microdata vocabulary you’re using, by adding an itemtype attribute. The second thing you always do is declare the scope of the vocabulary, using an itemscope attribute. In this example, all the data we want to semanti-fy is in a <section> element, so we’ll declare the itemtype and itemscope attributes on the <section> element.

<section itemscopeitemtype="http://data-vocabulary.org/Person">

Your name is the first bit of data within the <section> element. It’s wrapped in an <h1> element. The <h1> element doesn’t have any special processing in the HTML5 microdata data model, so it falls under the “all other elements” rule where the microdata property value is simply the text content of an element. (This would work equally well if your name was wrapped in a <p>, <div>, or <span> element.)

<h1 itemprop="name">Mark Pilgrim</h1>

In English, this says “here is the name property of the http://data-vocabulary.org/Person vocabulary, and the value of the property is Mark Pilgrim.”

Next up: the photo property. This is supposed to be a URL. According to the HTML5 microdata data model, the “value” of an <img> element is its src attribute. Hey look, the URL of your profile photo is already in an <img src> attribute. All you need to do is declare that the <img> element is the photo property.

In English, this says “here is the photo property of the http://data-vocabulary.org/Person vocabulary, and the value of the property is http://www.example.com/photo.jpg.

Finally, the url property is also a URL. According to the HTML5 microdata data model, the “value” of an <a> element is its href attribute. And once again, this fits perfectly with your existing markup. All you need to do is say that your existing <a> element is the url property:

<a itemprop="url" href="http://diveintomark.org/">dive into mark</a>

In English, this says “here is the url property of the http://data-vocabulary.org/Person vocabulary, and the value of the property is http://diveintomark.org/.

Of course, if your markup looks a little different, that’s not a problem. You can add microdata properties and values to any HTML markup, even really gnarly 20th-century-era, tables-for-layout, Oh-God-why-did-I-agree-to-maintain-this markup. While I don’t recommend this kind of markup, it is still common, and you can still add microdata to it.

For marking up the name property, just add an itemprop attribute on the table cell that contains the name. Table cells have no special rules in the microdata property value table, so they get the default value, “the microdata property is the text content.”

<TR><TD>Name<TD itemprop="name">Mark Pilgrim

Adding the url property looks trickier. This markup doesn’t use the <a> element properly. Instead of putting the link target in the href attribute, it has nothing useful in the href attribute and uses Javascript in the onclick attribute to call a function (not shown) that extracts the URL and navigates to it. For extra “please stop doing that” bonus points, let’s pretend that the function also opens the link in a tiny popup window with no scroll bars. Wasn’t the internet fun last century?

Anyway, you can still convert this into a microdata property, you just need to be a little creative. Using the <a> element directly is out of the question. The link target isn’t in the href attribute, and there’s no way to override the rule that says “in an <a> element, look for the microdata property value in the href attribute.” But you can add a wrapper element around the entire mess, and use that to add the url microdata property.

Since the <span> element has no special processing, it uses the default rule, “the microdata property is the text content.” “Text content” doesn’t mean “all the markup inside this element” (like you would get with, say, the innerHTMLDOM property). It means “just the text, ma’am.” In this case, http://diveintomark.org/, the text content of the <a> element inside the <span> element.

To sum up: you can add microdata properties to any markup. If you’re using HTML correctly, you’ll find it easier to add microdata than if your HTML markup sucks, but it can always be done.

Marking Up People

By the way, the starter examples in the previous section weren’t completely made up. There really is a microdata vocabulary for marking up information about people, and it really is that easy. Let’s take a closer look.

The easiest way to integrate microdata into a personal website is on your “about” page. You do have an “about” page, don’t you? If not, you can follow along as I extend this sample “about” page with additional semantics. The final result is here: person-plus-microdata.html.

Let’s look at the raw markup first, before any microdata properties have been added:

The first thing you always need to do is declare the vocabulary you’re using, and the scope of the properties you want to add. You do this by adding the itemtype and itemscope attributes on the outermost element that contains the other elements that contain the actual data. In this case, that’s a <section> element.

Now you can start defining microdata properties from the http://data-vocabulary.org/Person vocabulary. But what are those properties? As it happens, you can see the list of properties by navigating to data-vocabulary.org/Person in your browser. The microdata specification does not require this, but I’d say it’s certainly a “best practice.” After all, if you want developers to actually use your microdata vocabulary, you need to document it. And where better to put your documentation than the vocabulary URL itself?

Person vocabulary

Property

Description

name

Name

nickname

Nickname

photo

An image link

title

The person’s title (for example, “Financial Manager”)

role

The person’s role (for example, “Accountant”)

url

Link to a web page, such as the person’s home page

affiliation

The name of an organization with which the person is associated (for example, an employer)

friend

Identifies a social relationship between the person described and another person

contact

Identifies a social relationship between the person described and another person

acquaintance

Identifies a social relationship between the person described and another person

address

The location of the person. Can have the subproperties street-address, locality, region, postal-code, and country-name.

The first thing in this sample “about” page is a picture of me. Naturally, it’s marked up with an <img> element. To declare that this <img> element is my profile picture, all we need to do is add itemprop="photo" to the <img> element.

Where’s the microdata property value? It’s already there, in the src attribute. If you recall from the HTML5 microdata data model, the “value” of an <img> element is its src attribute. Every <img> element has a src attribute — otherwise it would just be a broken image — and the src is always a URL. See? If you’re using HTML correctly, microdata is easy.

Furthermore, this <img> element isn’t alone on the page. It’s a child element of the <section> element, the one we just declared with the itemscope attribute. Microdata reuses the parent-child relationship of elements on the page to define the scoping of microdata properties. In plain English, we’re saying, “This <section> element represents a person. Any microdata properties you might find on the children of the <section> element are properties of that person.” If it helps, you can think of the <section> element has the subject of a sentence. The itemprop attribute represents the verb of the sentence, something like “is pictured at.” The microdata property value represents the object of the sentence.

The subject only needs to be defined once, by putting itemscope and itemtype attributes on the outermost <section> element. The verb is defined by putting the itemprop="photo" attribute on the <img> element. The object of the sentence doesn’t need any special markup at all, because the HTML5 microdata data model says that the property value of an <img> element is its src attribute.

Moving on to the next bit of markup, we see an <h1> header and the beginnings of a <dl> list. Neither the <h1> nor the <dl> need to be marked up with microdata. Not every piece of HTML needs to be a microdata property. Microdata is about the properties themselves, not the markup or headers surrounding the properties. This <h1> isn’t a property; it’s just a header. Similarly, the <dt> that says “Name” isn’t a property; it’s just a label.

Boring

Boring

<h1>Contact Information</h1>
<dl>
<dt>Name</dt>
<dd>Mark Pilgrim</dd>

So where is the real information? It’s in the <dd> element, so that’s where we need to put the itemprop attribute. Which property is it? It’s the name property. Where is the property value? It’s the text within the <dd> element. Does that need to be marked up? the HTML5 microdata data model says no, <dd> elements have no special processing, so the property value is just the text within the element.

What did we just say, in English? “This person’s name is Mark Pilgrim.” Well OK then. Onward.

The next two properties are a little tricky. This is the markup, pre-microdata:

<dt>Position</dt>
<dd>Developer advocate for Google, Inc.</dd>

If you look at the definition of the Person vocabulary, the text “Developer advocate for Google, Inc.” actually encompasses two properties: title (“Developer advocate”) and affiliation (“Google, Inc.”). How can you express that in microdata? The short answer is, you can’t. Microdata doesn’t have a way to break up runs of text into separate properties. You can’t say “the first 18 characters of this text is one microdata property, and the last 12 characters of this text is another microdata property.”

But all is not lost. Imagine that you wanted to style the text “Developer advocate” in a different font from the text “Google, Inc.” CSS can’t do that either. So what would you do? You would first need to wrap the different bits of text in dummy elements, like <span>, then apply different CSS rules to each <span> element.

This technique is also useful for microdata. There are two distinct pieces of information here: a title and an affiliation. If you wrap each piece in a dummy <span> element, you can declare that each <span> is a separate microdata property.

Tada! “This person’s title is 'Developer advocate.' This person is employed by Google, Inc.” Two sentences, two microdata properties. A little more markup, but a worthwhile tradeoff.

The same technique is useful for marking up street addresses. The Person vocabulary defines an address property, which itself is a microdata item. That means the address has its own vocabulary (http://data-vocabulary.org/Address) and defines its own properties. The Address vocabulary defines 5 properties: street-address, locality, region, postal-code, and country-name.

If you’re a programmer, you are probably familiar with dot notation to define objects and their properties. Think of the relationship like this:

Person

Person.address

Person.address.street-address

Person.address.locality

Person.address.region

Person.address.postal-code

Person.address.country-name

In this example, the entire street address is contained in a single <dd> element. (Once again, the <dt> element is just a label, so it plays no role in adding semantics with microdata.) Notating the address property is easy. Just add an itemprop attribute on the <dd> element.

We’ve seen all of this before, but only for top-level items. A <section> element defines itemtype and itemscope, and all the elements within the <section> element that define microdata properties are “scoped” within that specific vocabulary. But this is the first time we’ve seen nested scopes — defining a new itemtype and itemscope (on the <dd> element) within an existing one (on the <section> element). This nested scope works exactly like the HTMLDOM. The <dd> element has a certain number of child elements, all of which are scoped to the vocabulary defined on the <dd> element. Once the <dd> element is closed with a corresponding </dd> tag, the scope reverts to the vocabulary defined by the parent element (<section>, in this case).

The properties of the Address suffer the same problem we encountered with the title and affiliation properties. There’s just one long run of text, but we want to break it up into five separate microdata properties. The solution is the same: wrap each distinct piece of information in a dummy <span> element, then declare microdata properties on each <span> element.

In English: “This person has a mailing address. The street address part of the mailing address is '100 Main Street.' The locality part is 'Anytown.' The region is 'PA.' The postal code is '19999.' The country name is 'USA.'” Easy peasy.

Ask Professor Markup

Q: Is this mailing address format US-specific?
A: No. The properties of the Address vocabulary are generic enough that they can describe most mailing addresses in the world. Not all addresses will have values for every property, but that’s OK. Some addresses might require fitting more than one “line” into a single property, but that’s OK too. For example, if your mailing address has a street address and a suite number, they would both go into the street-address subproperty:

There’s one more thing on this sample “about” page: a list of URLs. The Person vocabulary has a property for this, called url. A url property can be anything, really. (Well, it has to be a URL, but you probably guessed that.) What I mean is that the url property is loosely defined. The property can be any sort of URL that you want to associate with a Person: a weblog, a photo gallery, or a profile on another site like Facebook or Twitter.

The other important thing to note here is that a single Person can have multiple url properties. Technically, any property can appear more than once, but until now, we haven’t taken advantage of that. For example, you could have two photo properties, each pointing to a different image URL. Here, I want to list four different URLs: my weblog, my Google profile page, my user profile on Reddit, and my Twitter account. In HTML, that’s a list of links: four <a> elements, each in their own <li> element. In microdata, each <a> element gets an itemprop="url" attribute.

According to the HTML5 microdata data model, <a> elements have special processing. The microdata property value is the href attribute, not the child text content. The text of each link is actually ignored by a microdata processor. Thus, in English, this says “This person has a URL at http://diveintomark.org/. This person has another URL at http://www.google.com/profiles/pilgrim. This person has another URL at http://www.reddit.com/user/MarkPilgrim. This person has another URL at http://www.twitter.com/diveintomark.”

Introducing Google Rich Snippets

I want to step back for just a moment and ask, “Why are we doing this?” Are we adding semantics just for the sake of adding semantics? Don’t get me wrong; I enjoy fiddling with angle brackets as much as the next webhead. But why microdata? Why bother?

There are two major classes of applications that consume HTML, and by extension, HTML5 microdata:

Web browsers

Search engines

For browsers, HTML5 defines a set of DOMAPIs for extracting microdata items, properties, and property values from a web page. At time of writing (February 2011), no browser supports this API. Not a single one. So that’s… kind of a dead end, at least until browsers catch up and implement the client-side APIs.

The other major consumer of HTML is search engines. What could a search engine do with microdata properties about a person? Imagine this: instead of simply displaying the page title and an excerpt of text, the search engine could integrate some of that structured information and display it. Full name, job title, employer, address, maybe even a little thumbnail of a profile photo. Would that catch your attention? It would catch mine.

It’s all there: the photo property from the <img src> attribute, all four URLs from the list of <a href> attributes, even the address object (listed as “Item 1”) and all five of its subproperties.

And how does Google use all of this information? That depends. There’s no hard and fast rules about how microdata properties should be displayed, which ones should be displayed, or whether they should be displayed at all. If someone searches for “Mark Pilgrim,” and Google determines that this “about” page should rank in the results, and Google decides that the microdata properties it originally found on that page are worth displaying, then the search result listing might look something like this:

About Mark PilgrimAnytown PA - Developer advocate - Google, Inc.Excerpt from the page will show up here.
Excerpt from the page will show up here.diveintohtml5.info/examples/person-plus-microdata.html - Cached - Similar pages

The first line, “About Mark Pilgrim,” is actually the title of the page, given in the <title> element. That’s not terribly exciting; Google does that for every page. But the second line is full of information taken directly from the microdata annotations we added to the page. “Anytown PA” was part of the mailing address, marked up with the http://data-vocabulary.org/Address vocabulary. “Developer advocate” and “Google, Inc.” were two properties from the http://data-vocabulary.org/Person vocabulary (title and affiliation, respectively).

This is really quite amazing. You don’t need to be a large corporation making special deals with search engine vendors to customize your search result listings. Just take ten minutes and add a couple of HTML attributes to annotate the data you were already publishing anyway.

Ask Professor Markup

Q: I did everything you said, but my Google search result listing doesn’t look any different. What gives?
A: “Google does not guarantee that markup on any given page or site will be used in search results.” But even if Google decides not to use your microdata annotations, another search engine might. Like the rest of HTML5, microdata is an open standard that anyone can implement. It’s your job to provide as much data as possible. Let the rest of the world decide what to do with it. They might surprise you!

Marking Up Organizations

Microdata isn’t limited to a single vocabulary. “About” pages are nice, but you probably only have one of them. Still hungry for more? Let’s learn how to mark up organizations and businesses.

Short and sweet. All the information about the organization is contained within the <article> element, so let’s start there.

<article itemscopeitemtype="http://data-vocabulary.org/Organization">

As with marking up people, you need to set the itemscope and itemtype attributes on the outermost element. In this case, the outermost element is an <article> element. The itemtype attribute declares the microdata vocabulary you’re using (in this case, http://data-vocabulary.org/Organization), and the itemscope attribute declares that all of the properties you set on child elements relate to this vocabulary.

So what’s in the Organization vocabulary? It’s simple and straightforward. In fact, some of it should already look familiar.

Organization vocabulary

Property

Description

name

The name of the organization (for example, “Initech”)

url

Link to the organization’s home page

address

The location of the organization. Can contain the subproperties street-address, locality, region, postal-code, and country-name.

tel

The telephone number of the organization

geo

Specifies the geographical coordinates of the location. Always contains two subproperties, latitude and longitude.

The first bit of markup within the outermost <article> element is an <h1>. This <h1> element contains the name of a business, so we’ll put an itemprop="name" attribute directly on the <h1> element.

According to the HTML5 microdata data model, <h1> elements don’t need any special processing. The microdata property value is simply the text content of the <h1> element. In English, we just said “the name of the Organization is 'Google, Inc.'”

Next up is a street address. Marking up the address of an Organization works exactly the same way as marking up the address of a Person. First, add an itemprop="address" attribute to the outermost element of the street address (in this case, a <p> element). That states that this is the address property of the Organization. But what about the properties of the address itself? We also need to define the itemtype and itemscope attributes to say that this is an Address item that has its own properties.

Finally, we need to wrap each distinct piece of information in a dummy <span> element so we can add the appropriate microdata property name (street-address, locality, region, postal-code, and country-name) on each <span> element.

In English, we just said “This organization has an address. The street address part is '1600 Amphitheatre Parkway'. The locality is 'Mountain View'. The region part is 'CA'. The postal code is '94043'. The name of the country is 'USA'.”

Next up: a telephone number for the Organization. Telephone numbers are notoriously tricky, and the exact syntax is country-specific. (And if you want to call another country, it’s even worse.) In this example, we have a United States telephone number, in a format suitable for calling from elsewhere in the United States.

(Hey, in case you didn’t notice, the Address vocabulary went out of scope when its <p> element was closed. Now we’re back to defining properties in the Organization vocabulary.)

If you want to list more than one telephone number — maybe one for United States customers and one for international customers — you can do that. Any microdata property can be repeated. Just make sure each telephone number is in its own HTML element, separate from any label you may give it.

According to the HTML5 microdata data model, neither the <p> element nor the <span> element have special processing. The value of the microdata tel property is simply the text content. The Organization microdata vocabulary makes no attempt to subdivide the different parts of a telephone number. The entire tel property is just free-form text. If you want to put the area code in parentheses, or use spaces instead of dashes to separate the numbers, you can do that. If a microdata-consuming client wants to parse the telephone number, that’s entirely up to them.

Next, we have another familiar property: url. Just like associating a URL with a Person, you can associate a URL with an Organization. This could be the company’s home page, a contact page, product page, or anything else. If it’s a URL about, from, or belonging to the Organization, mark it up with an itemprop="url" attribute.

According to the HTML5 microdata data model, the <a> element has special processing. The microdata property value is the value of the href attribute, not the link text. In English, this says “this organization is associated with the URLhttp://www.google.com/.” It doesn’t say anything more specific about the association, and it doesn’t include the link text “Google.com.”

Finally, I want to talk about geolocation. No, not the W3C Geolocation API. This is about how to mark up the physical location for an Organization, using microdata.

To date, all of our examples have focused on marking up visible data. That is, you have an <h1> with a company name, so you add an itemprop attribute to the <h1> element to declare that the (visible) header text is, in fact, the name of an Organization. Or you have an <img> element that points to a photo, so you add an itemprop attribute to the <img> element to declare that the (visible) image is a photo of a Person.

In this example, geolocation information isn’t like that. There is no visible text that gives the exact latitude and longitude (to four decimal places!) of the Organization. In fact, the organization.html example (without microdata) has no geolocation information at all. It has a link to Google Maps, but even the URL of that link does not contain latitude and longitude coordinates. (It contains similar information in a Google-specific format.) But even if we had a link to a hypothetical online mapping service that did take latitude and longitude coordinates as URL parameters, microdata has no way of separating out the different parts of a URL. You can’t declare that the first URL query parameter is the latitude and the second URL query parameter is the longitude and the rest of the query parameters are irrelevant.

To handle edge cases like this, HTML5 provides a way to annotate invisible data. This technique should only be used as a last resort. If there is a way to display or render the data you care about, you should do so. Invisible data that only machines can read tends to “go stale” quickly. That is, someone will come along later and update the visible text but forget to update the invisible data. This happens more often than you think, and it will happen to you too.

Still, there are cases where invisible data is unavoidable. Perhaps your boss really wants machine-readable geolocation information but doesn’t want to clutter up the interface with pairs of incomprehensible six-digit numbers. Invisible data is the only option. The only saving grace here is that you can put the invisible data immediately after the visible text that it describes, which may help remind the person who comes along later and updates the visible text that they need to update the invisible data right after it.

In this example, we can create a dummy <span> element within the same <article> element as all the other Organization properties, then put the invisible geolocation data inside the <span> element.

itemprop="geo" says that this element represents the geo property of the surrounding Organization

itemtype="http://data-vocabulary.org/Geo" says which microdata vocabulary this element’s properties conform to

itemscope says that this element is the enclosing element for a microdata item with its own vocabulary (given in the itemtype attribute). All the properties within this element are properties of http://data-vocabulary.org/Geo, not the surrounding http://data-vocabulary.org/Organization.

The next big question that this example answers is, “How do you annotate invisible data?” You use the <meta> element. In previous versions of HTML, you could only use the <meta> element within the <head> of your page. In HTML5, you can use the <meta> element anywhere. And that’s exactly what we’re doing here.

According to the HTML5 microdata data model, the <meta> element has special processing. The microdata property value is the content attribute. Since this attribute is never visibly displayed, we have the perfect setup for unlimited quantities of invisible data. With great power comes great responsibility. In this case, the responsibility is on you to ensure that this invisible data stays in sync with the visible text around it.

There is no direct support for the Organization vocabulary in Google Rich Snippets, so I don’t have any pretty sample search result listings to show you. But organizations feature heavily in the next two case studies: events and reviews, and those are supported by Google Rich Snippets.

Marking Up Events

Things happen. Some things happens at pre-determined times. Wouldn’t it be nice if you could tell search engines exactly when something was about to happen? There’s an angle bracket for that.

The category of the event (for example, “Concert” or “Lecture”). This is a freeform string, not an enumerated attribute.

geo

Specifies the geographical coordinates of the location. Always contains two subproperties, latitude and longitude.

photo

A link to a photo or image related to the event

The event’s name is in an <h1> element. According to the HTML5 microdata data model, <h1> elements have no special processing. The microdata property value is simply the text content of the <h1> element. All we need to do is add the itemprop attribute to declare that this <h1> element contains the name of the event.

In English, this says, “The name of this event is Google Developer Day 2009.”

This event listing has a photo, which can be marked up with the photo property. As you would expect, the photo is already marked up with an <img> element. Like the photo property in the Person vocabulary, an Event photo is a URL. Since the HTML5 microdata data model says that the property value of an <img> element is its src attribute, the only thing we need to do is add the itemprop attribute to the <img> element.

In English, this says, “The photo for this event is at http://diveintohtml5.info/examples/gdd-2009-prague-pilgrim.jpg.”

Next up is a longer description of the event, which is just a pargaraph of freeform text.

<p itemprop="description">Google Developer Days are a chance to
learn about Google developer products from the engineers who built
them. This one-day conference includes seminars and “office
hours” on web technologies like Google Maps, OpenSocial,
Android, AJAX APIs, Chrome, and Google Web Toolkit.</p>

The next bit is something new. Events generally occur on specific dates and start and end at specific times. In HTML5, dates and times should be marked up with the <time> element, and we are already doing that here. So the question becomes, how do we add microdata propeties to these <time> elements? Looking back at the HTML5 microdata data model, we see that the <time> element has special processing. The value of a microdata property on a <time> element is the value of the datetime attribute. And hey, the startDate and endDate properties of the Event vocabulary take an ISO-style date, just like the datetime property of a <time> element. Once again, the semantics of the core HTML vocabulary dovetail nicely with semantics of our custom microdata vocabulary. Marking up start and end dates with microdata is as simple as

Using HTML correctly in the first place (using <time> elements to mark up dates and times), and

In English, this says, “This event starts on November 6, 2009, at 8:30 in the morning, and goes until November 6, 2009, at 20:30 (times local to Prague, GMT+1).”

Next up is the location property. The definition of the Event vocabulary says that this can be either an Organization or an Address. In this case, the event is being held at a venue that specializes in conferences, the Congress Center in Prague. Marking it up as an Organization allows us to include the name of the venue as well as its address.

First, let’s declare that the <p> element that contains the address is the location property of the Event, and that this element is also its own microdata item that conforms to the http://data-vocabulary.org/Organization vocabulary.

Due to the microdata scoping rules, this itemprop="name" is defining a property in the Organization vocabulary, not the Event vocabulary. The <p> element defined the beginning of the scope of the Organization properties, and that <p> element hasn’t yet been closed with an </p> tag. Any microdata properties we define here are properties of the most-recently-scoped vocabulary. Nested vocabularies are like a stack. We haven’t yet popped the stack, so we’re still talking about properties of the Organization.

In fact, we’re going to add a third vocabulary onto the stack: an Address for the Organization for the Event.

There are no more properties of the Address, so we close the <span> element that started the Address scope, and pop the stack.

</span>

There are no more properties of the Organization, so we close the <p> element that started the Organization scope, and pop the stack again.

</p>

Now we’re back to defining properties on the Event. The next property is geo, to represent the physical location of the Event. This uses the same Geo vocabulary that we used to mark up the physical location of an Organization in the previous section. We need a <span> element to act as the container; it gets the itemtype and itemscope attributes. Within that <span> element, we need two <meta> elements, one for the latitude property and one for the longitude property.

And we’ve closed the <span> that contained the Geo properties, so we’re back to defining properties on the Event. The last property is the url property, which should look familiar. Associating a URL with an Event works the same way as associating a URL with a Person and associating a URL with an Organization. If you’re using HTML correctly (marking up hyperlinks with <a href>), then declaring that the hyperlink is a microdata url property is simply a matter of adding the itemprop attribute.

The sample event page also lists a second event, my speaking engagement at the ConFoo conference in Montréal. For brevity, I’m not going to go through that markup line by line. It’s essentially the same as the event in Prague: an Event item with nested Geo and Address items. I just mention it in passing to reiterate that a single page can have multiple events, each marked up with microdata.

As you can see, all the information we added in microdata is there. Properties that are separate microdata items are given internal IDs (Item(__1), Item(__2) and so on). This is not part of the microdata specification. It’s just a convention that Google’s testing tool uses to linearize the sample output and show you the grouping of nested items and their properties.

Here is how Google might choose to represent this sample page in its search results. (Again, I have to preface this with the disclaimer that this is just an example. Google may change the format of their search results at any time, and there is no guarantee that Google will even pay attention to your microdata markup. Sorry to sound like a broken record, but our lawyers make me say these things.)

After the page title and auto-generated excerpt text, Google starts using the microdata markup we added to the page to display a little table of events. Note the date format: “Fri, Nov 6.” That is not a string that appeared anywhere in our HTML or microdata markup. We used two fully qualified ISO-formatted strings, 2009-11-06T08:30+01:00 and 2009-11-06T20:30+01:00. Google took those two dates, figured out that they were on the same day, and decided to display a single date in a more friendly format.

Now look at the physical addresses. Google chose to display just the venue name + locality + country, not the exact street address. This is made possible by the fact that we split up the address into five subproperties — name, street-address, region, locality, and country-name — and marked up each part of the address as a different microdata property. Google takes advantage of that to show an abbreviated address. Other consumers of the same microdata markup might make different choices about what to display or how to display it. There’s no right or wrong choice here. It’s up to you to provide as much data as possible, as accurately as possible. It’s up to the rest of the world to interpret it.

Marking Up Reviews

Here’s another example of making the web (and possibly search result listings) better through markup: business and product reviews.

This is a short review I wrote of my favorite pizza place near my house. (This is a real restaurant, by the way. If you’re ever in Apex, NC, I highly recommend it.) Let’s look at the original markup:

<article>
<h1>Anna’s Pizzeria</h1>
<p>★★★★☆ (4 stars out of 5)</p>
<p>New York-style pizza right in historic downtown Apex</p>
<p>
Food is top-notch. Atmosphere is just right for a “neighborhood
pizza joint.” The restaurant itself is a bit cramped; if you’re
overweight, you may have difficulty getting in and out of your
seat and navigating between other tables. Used to give free
garlic knots when you sat down; now they give you plain bread
and you have to pay for the good stuff. Overall, it’s a winner.
</p>
<p>
100 North Salem Street<br>
Apex, NC 27502<br>
USA
</p>
<p>— reviewed by Mark Pilgrim, last updated March 31, 2010</p>
</article>

I’m going to skip over the actual rating and come back to that at the end.

The next two properties are also straightforward. The summary property is a short description of what you’re reviewing, and the description property is the body of the review.

<p itemprop="summary">New York-style pizza right in historic downtown Apex</p>
<p itemprop="description">
Food is top-notch. Atmosphere is just right for a “neighborhood
pizza joint.” The restaurant itself is a bit cramped; if you’re
overweight, you may have difficulty getting in and out of your
seat and navigating between other tables. Used to give free
garlic knots when you sat down; now they give you plain bread
and you have to pay for the good stuff. Overall, it’s a winner.
</p>

The final line presents a familiar problem: it contains two bits of information in one element. The name of the reviewer is Mark Pilgrim, and the review date is March 31, 2010. How do we mark up these two distinct properties? Wrap them in their own elements and put an itemprop attribute on each element. In fact, the date in this example should have been marked up with a <time> element in the first place, so that provides a natural hook on which to hang our itemprop attribute. The reviewer name can just be wrapped in a dummy <span> element.

OK, let’s talk ratings. The trickiest part of marking up a review is the rating. By default, ratings in the Review vocabulary are on a scale of 1–5, 1 being “terrible” and 5 being “awesome.” If you want to use a different scale, you can definitely do that. But let’s talk about the default scale first.

If you’re using the default 1–5 scale, the only property you need to mark up is the rating itself (4, in this case). But what if you want to use a different scale? You can do that; you just need to declare the limits of the scale you’re using. For example, if you wanted to use a 0–10 point scale, you would still declare the itemprop="rating" property, but instead of giving the rating value directly, you would use a nested vocabulary of http://data-vocabulary.org/Rating to declare the worst and best values in your custom scale and the actual rating value within that scale.

And here (modulo the whims of Google, the phase of the moon, and so on and so forth) is what my review might look like in a search result listing:

Anna’s Pizzeria: review★★★★☆ Review by Mark Pilgrim - Mar 31, 2010Excerpt from the page will show up here.
Excerpt from the page will show up here.diveintohtml5.info/examples/review-plus-microdata.html - Cached - Similar pages

This has been ‘“Distributed,” “Extensibility,” & Other Fancy Words.’ The full table of contents has more if you’d like to keep reading.

Did You Know?

In association with Google Press, O’Reilly is distributing this book in a variety of formats, including paper, ePub, Mobi, and DRM-free PDF. The paid edition is called “HTML5: Up & Running,” and it is available now. This chapter is included in the paid edition.