Document Outlines

Document outlines have changed a bit in HTML5. For a start, they’re actually in the spec (and have beenfor years – (2008). The HTML5 Doctor is here to explain what document outlines are, how to make good ones, and why you should care.

The document outline is the structure of a document, generated by the document’s headings, form titles, table titles, and any other appropriate landmarks to map out the document. The user agent can apply this information to generate a table of contents, for example. This table of contents could then be used by assistive technology to help the user, or be parsed by a machine like a search engine to improve search results.
The outlining algorithm has been clearly defined in the HTML5 spec, so once all browsers and assistive technologies play ball, there will be some major accessibility wins (more on support later). Before we take a look at how this new algorithm works, it’s time for a quick walk down memory lane.

Creating document outlines prior to HTML5 was simple. You had six heading elements, <h1> through <h6>. Lower-numbered headings were of a higher rank of higher-numbered ones — i.e. <h1> was ranked higher than <h2>:

<h1>My fantastic site</h1>
<h2>About me</h2>
<p>I am a man who lives a fascinating life. Oh the stories I could tell you...</p>
<h3>What I do for a living</h3>
<p>I sell enterprise-managed ant farms.</p>
<h2>Contact</h2>
<p>Shout my name and I will come to you.</p>

This example would produce the following outline:

My fantastic site

About me

What I do for a living

Contact

The <h2> titles are children of the <h1>, and the “About me” content has a further sub-heading using an <h3>. It’s simple but restrictive, as you have to ensure the heading levels are appropriate for the intended structure, and you’re limited to six levels. The latter restriction is usually not such a problem, but it still exists for all you heading fanatics (oh you guys!).
HTML5 does this as well. The above example would produce the same outline, but it can be taken even further using the new sectioning elements.

Warning! The HTML5 document outline, in practical terms, is theoretical only, as it has not been implemented in user agents, so people who make use of heading semantics get the heading level as per the h1-h6 elements (HTML 4 outline) i.e. sectioning level is ignored.

The concepts behind HTML5 document outlines are actually older than you might think! Tim Berners-Lee posted to the www-talk mailing list back in 1991 (props to Dr Oli for digging that up), suggesting something quite close to what is demonstrated in this article.
The sectioning elements <section>, <article>, <aside> and <nav> can all help to create a more logical structure in the document outline. Let’s go crazy and rewrite our previous example using only <h1> elements for headings:

<h1>My fantastic site</h1>
<h1>About me</h1>
<p>I am a man who lives a fascinating life. Oh the stories I could tell you...</p>
<h1>What I do for a living</h1>
<p>I sell enterprise-managed ant farms.</p>
<h1>Contact</h1>
<p>Shout my name and I will come to you.</p>

The outline would now look like this:

My fantastic site

About me

What I do for a living

Contact

Clearly, that’s no good — we’ve lost our structure! With sectioning elements, we can make it look like before without changing those headings. In this particular example, I think <section> is most appropriate:

<h1>My fantastic site</h1>
<section>
<h1>About me</h1>
<p>I am a man who lives a fascinating life. Oh the stories I could tell you...</p>
<section>
<h1>What I do for a living</h1>
<p>I sell enterprise-managed ant farms.</p>
</section>
</section>
<section>
<h1>Contact</h1>
<p>Shout my name and I will come to you.</p>
</section>

Run it through the outliner and we’re back to normal:

My fantastic site

About me

What I do for a living

Contact

But why? The sectioning elements act quite literally as their name suggests: they define sections of the parent element. These sections can be thought of as child nodes whose headings fall under their parent heading, regardless of their rank. The following example illustrates this further:

<h2>HTML5 Doctor articles</h2>
<article>
<h1>The section element</h1>
<p>We doctors are a bunch of chums using HTML5 and writing about how we do it...</p>
</article>
<article>
<h1>The article element</h1>
<p>We’ve discussed a lot of new elements here at HTML5Doctor...</p>
</article>

Even though the articles contain <h1>s, this produces the following outline:

HTML5 Doctor articles

The section element

The article element

Equally, owing to how the outliner works, the following examples
(while probably not the best use of headings) produce the exact same above outline:

When choosing which heading to use in your documents, the spec has recommendations:

Sections may contain headings of any rank, and authors are strongly encouraged to use headings of the appropriate rank for the section’s nesting level.
— HTML 5.1 specification

Note: due to the lack of support in browsers for the document outline and negative prognosis for future support, strengthening of the current advice to a normative requirement is currently under discussion (January 2014).

You should also make sure you’re aware of how differently ranked headings work when used as direct children of a sectioning element. It’s how it worked prior to HTML5:

The first element of heading content in an element of sectioning content represents the heading for that section. Subsequent headings of equal or higher rank start new (implied) sections, headings of lower rank start implied subsections that are part of the previous one. In both cases, the element represents the heading of the implied section.
— HTML 5.1 specification

The outliner has taken the liberty of flagging the sectioning element as untitled, to act as a warning and to preserve a logical structure. For accessibility reasons, we recommend each sectioning element have a heading, even <aside> and <nav>, as shown below. If you don’t want these headings to be visible, you can always hide them with CSS.

How does <hgroup> affect the outline?

As Dr Richard Clark said in our <hgroup> article, <hgroup> is all about the document outline. The outliner will disregard all headings within <hgroup> except the one with the highest ranking. For example, if an <hgroup> contains an <h2>, an <h3> and an <h4>, only the <h2>’s text will be used as the section title in the outline.
At the time of writing, <hgroup>’s future is a little uncertain. It was recently removed and then returned to the HTML5 spec, and there are proposals for its removal or replacement with an alternative. We’ll be sure to keep HTML5 Doctor up-to-date with any changes as they unfold.

Sectioning roots, introduced in HTML5, isolate certain parts of a document to their own separate outlines. Headings within these elements will not show up in the main outline, where the sectioning root element is the <body>.
The other sectioning root elements are <blockquote>, <figure>, <details>, <fieldset>, and <td>. Each one of these elements is a descendant of the <body> element, but its headings are removed from the top-level outline, instead starting its own isolated outline.

Unfortunately, there is little support for the new outlining algorithms right now. Search engines may be experimenting with it in their crawling algorithms as you read this, but as far as we know, headings are treated just as they were before. You won’t be penalised for using them, even if you use multiple <h1>s (which have always been okay as far as the spec is concerned). Check out our HTML5 and Search Engine Optimisation article for more on search engines and HTML5.
At the time of writing, browsers and screen readers do not support these new outlines, so if you do use multiple <h1>s in your documents, it may confuse your users. It’s best if you use logical heading levels — <h1>–<h6> — at least until the new outlines are more widely supported.
As for browsers, both recent releases of Firefox and Chrome have a user agent styles that support HTML5 document outlines. Try this bare-bones example in the latest Chrome or Firefox.

Update 21/01/2014

There is still no implementation of the document outline semantics in browsers apart from CSS styling. Refer to this recent article about The HTML5 Document Outline.

Despite the spotty support, it’s definitely worth thinking carefully about your document outlines so you’re prepared for the future, and tune in here for news of improved support.
Get to grips with the sectioning elements and sectioning roots and how each affects the outline. When marking up a new site, consider how you could take advantage of the new document outline algorithm. As user agent support strengthens, pages you made with your new-found knowledge of document outlines will be more accessible. Let us know what you think in the comments below!

Category

Tags

This article was written by Mike Robinson. A developer at Lift in Reading, England, you can catch him on Twitter or occasionally blogging on his own site, akamike. Beyond the web, Mike is usually gaming or listening to progressive rock.

Thanks for the article Mike. Interesting stuff. I’m trying to write a little outliner at the moment (maybe it’d be useful as a bookmarklet or something one day). I’ve got it parsing all the samples in the article correctly (I think), but I’m a little unsure of what happens when there’s a mix of HTML4-style outlining and HTML5’s.

For example, I’ve added a heading to a previous example:

<section>
<h1>About me</h1>
<p>I am a man who lives a fascinating life. Oh the stories I could tell you...</p>
<h2>And other stuff</h2>
<p>Well, I like to surf.</p>
<section>
<h1>What I do for a living</h1>
<p>I sell enterprise-managed ant farms.</p>
</section>
</section>

The spec says:

Each section can have one heading associated with it, and can contain any number of further nested sections.

So, I take it the h2 wouldn’t become another heading associated with the <section>’s section.

Later it says:

If the element being entered has a rank lower than the rank of the heading of the candidate section, then create a new section, and append it to candidate section.

From this I’m thinking the h2 would have its own outline generated and appended as a new section to the <section>’s section. However, since the h2 isn’t a sectioning content or sectioning root element, its outline would consist of its next siblings.
What happens to the <section> that follows the h2? Is its outline added as a child section of this newly generated section for the h2, or as a sibling of the h2’s section (a child of the top <section>’s section)?

I’ve almost given myself a headache reading that algorithm in the spec. I’m sorry if it has rubbed off and caused me to write drivel above!

@Neil – I believe you are right. The root node is probably the body tag. I think this is a fault in the outliner tool, since the spec makes clear that DOM subtrees can be outlined without having to be founded on a sectioning root element, so there is no justification for adding in the body element.

The mozilla article also seems wrong. <header> and <footer> are not sectioning elements (though this is a common misunderstanding). I recommend reading either the HTML5 spec, or the excellent HTML5 Doctor articles on the two elements.

@Neil — hopefully I’ve corrected your comment with the intended code. Gsnedder’s outliner is correct, just not as informative as it could be. Adding the code sample to a new HTML document and checking with the h5o outliner bookmarklet I get:

One thing I’ve been confused about is why the initial H1 wouldn’t be part of the section…

The first example shows:

My fantastic site

Would this:

My fantastic site

Be wrong? Would that result in something different?

I guess I’m being thrown off because I would presume that the heading is FOR that section… but it seems like HTML5 outlines automatically associate the heading with the section/article that follows it?

Very nice post indeed. Still crunching in my head how the sections work, but it’s pretty straightforward. People need to write that down and work with it a few times and they’ll get the point.

Above you said this:” If you don’t want these headings to be visible, you can always hide them with CSS.” in regards to aside. Hiding text via CSS isn’t seen seen as cloaking? That would hurt more than it would help.

If we are look at the code and going step-by-step, I remove the element the parent becomes Untitled nav. If I liive the element in the source I got an error about incorrectly ordered headings, but when I change it to I overuse the element. I am really confused here what would be the ideal step to do.

Rant:
I was hoping to find something more inspiring regarding the uptake of this pattern. You’d think people would be more excited to use it. I can understand why they are not – legacy support is a big issue for many of the larger organisations involved; the semantics are confusing when you first come across them, and the work-arounds like .h1 {}, .h2 {} actually confuse things (despite making it easier to write CSS).

What still surprises me is that one obvious potential use-case of this mechanism is still not being talked about or used much: syndication. The outlining algorithm could successfully solve the problem of porting one document (or “) between different contexts (different sections of a site, different sites and tools etc). With the proliferation of aggregators and multiple output channels (websites, native apps) etc, you’d hope people would be pushing harder for wider adoption and better support for the new outlining.
:/Rant

Always using in a section heading presents somewhat of a problem from a visual CSS perspective, making it an unrealistic option.

If a main section has an tag, with a specific font-weight and size, generally, each child section would have a visual representation that it’s a sibling – normally a smaller font size and/or have less weight.

Therefore it makes a great deal more sense to continue to use the heading tags h1 through to h6 appropriately.

I want to use html5 elements but to get the outline correct it seems I have to remove NAV elements and replace with DIV. The “role” of the html5 elements should also affect the outline, right? It seems for example that NAV role “presentation” should remove it from outline?

Is it possible to get a example for an entire site? Simplified off course but with all elements that are on a page including the html and body tags, navigation, main content, sidebar, comments, footer that also generates a correct outline that only shows relevant info.

Using <p> for a subheading is not semantic at all. It is important that a subheading be linked to the heading in some way. Plus, this use of <p> does not even conform to the standard’s definition of the tag. “A paragraph is typically a run of phrasing content that forms a block of text with one or more sentences…”

Hi Kenneth,
The full text puts the lie to your statement about the p element:

A paragraph is typically a run of phrasing content that forms a block of text with one or more sentences that discuss a particular topic, as in typography, but can also be used for more general thematic grouping. For instance, an address is also a paragraph, as is a part of a form, a byline, or a stanza in a poem.[emphasis mine].

There is a catch for me with the document outline. Say, I have a nav as a sectioning element. On the one hand, I want the nav to be titled, on the other, I don’t need the title (h1, for example) to be shown in the browser. You’ve suggested to hide it (“If you don’t want these headings to be visible, you can always hide them with CSS”), and I would be happy to do it. However, if I hide it with “display:none” it would be “inaccessible” by screen readers. That’s a paradox – it’s visible by an outliner, but not visible by web assistance means. Am I right? So, to avoid any confusion, how do I hide a title of a titled section, to both display it in outliner and not be visible in the browser.

Thank you!

P.S. “visibility: hidden” and “text-indent: -xxxx” won’t work, because they leave traces of “reserved” space for those “hidden” titles.

Why is the element <title> not considered in the document outline? Surely this should be the root title? I’ve gotten into the habit of having a <h1> at the start of my page to include the page title in the document hierarchy, but surely this shouldn’t be necessary given we have <title>?

Hi Lucio, you need to use character codes for for angle brackets in code examples. Anyway I tested chrome canary 42 and found no support for the outline algorithm apart from CSS sizing of headings, which has been implemented for years. Refer to Using only h1 elements in a HTML document for details of support issues.