There Is No Document Outline Algorithm

I figured I would state the entire argument in the title. After all, as of this writing and the last seven-plus years, the statement is accurate as far as the browsers are concerned.

I am penning this as sort of a follow-up to my post from 2013, The Truth about “The Truth About Multiple H1 Tags”. Even after that post helped kick off an update to the W3C HTML5 specification, it is not reflected in current tutorials and informative pieces.

Bad Information Persists

The appeal of ignoring heading levels for developers and authors is pretty compelling when you do not know how or where that content will appear (or it appears in many different locations). In particular, the all-<h1> approach appeals to many for its simplicity.

It makes sense why a developer might see this advice, or hear about it through misinformed articles, and never look back. This advice is a free pass. Many content authors don’t even know where their content will appear, making an all-<h1> approach feel like the safe approach.

Unfortunately, despite all the activity in the standards world along with the lack of activity on the part of the browsers, many developers continue to be unaware that this imparts no benefits to users and even harms many of those users. I run into this repeatedly when I answer questions on Stack Overflow, when I talk to developers in real life, and even from generally trusted outlets:

This was part of the spec, and it was “revoked”, which is not a nice thing to do. And it was revoked not because they considered that it was a bad idea, but because of screen readers not implementing it correctly.

Disregarding the fact that it was never part of the final W3C spec, that the spec had a warning for three years, that nobody considered the algorithm a bad idea, that screen readers had nothing to do with it, and that browsers not implementing it is different from correctly implementing it, there is one statement that belies the issue at hand.

Not a nice thing to do is a value judgment. It presumes that the specification’s primary benefactors are developers when in reality it is about users. It also presumes that it is acceptable to give developers advice (that harms some users) that has never been supported.

Like it or not, browsers are not moving on this feature and citing the purely theoretical document outline does nothing to move it forward. We as developers need to resolve this while still making it easy for content authors.

Update: February 13, 2017

There is a new issue opened against the W3C specification to try to understand how the outline algorithm is supposed to work so a polyfill can be created. This is sometimes a first step to getting support built into browsers. Read more at the issue, Update outline algorithm #794.

One Alternative (in Two Parts)

The average web developer would rather not have to think about mapping the appropriate heading level for every potential re-use of content. Authors should not have to think about it at all.

Server Side

Way back in the early oughts (actually, 1999–2000) I wrote a CMS (Content Management System) based on delimited text files. It was a lark. I wanted to teach myself some programming skills and my brother needed a mini-CMS while he was overseas.

I quickly ran into the heading issue that HTML5 tried to solve — sometimes his content would be re-used elsewhere in the layout, and the headings would not make sense anymore. But I solved it. I solved it without any fancy frameworks or libraries or HTML5 retooling.

Every content container carried a variable (this was all server-side code). That variable was a number reflecting its nesting level on the page. That number was then used to replace the number in any <h#> levels in the content (the content was chunked enough that there was not more than one heading).

I carried that technique forward into projects on much beefier CMSes and never had to worry about training authors how to manage chunked content on their home pages (and similar chunked pages). The move to HTML5 never made me consider an all-<h1> solution, partly because I knew the outline was not supported.

Client Side

Since so much of the content on modern sites comes in via client-side scripting, the code would simply need to be updated to run in the browser — this assumes you don’t mind offloading simple processing to thousands of users across uncontrollable run-times. But then if you are relying on client-side scripts to render a page you have already made your decision.

The following embedded code shows an HTML document that uses an all-<h1> structure. With a (not production-ready) chunk of JavaScript and some custom data- attributes on the sectioning containers, I re-write the <h1>s to reflect a document outline appropriate for this content. I use some CSS generated content to include the heading level after the text of each heading so you can easily see which is which. If the script does not work, you will see black headings sans parentheses for all.

Conceivably you can let your authors continue an all-<h1> approach, while your templates just tweak the structure based on attributes you embed in the layout.

Another Alternative

We can work to get browsers to support another new element, the <h> element (proposed in April, which was probably based on Gez Lemon’s 2004 suggestion). Browsers would still need to implement some sort of document outline algorithm, but in this case a new element means no need to rewrite existing <h#> logic.

That will require the developer community to come together as it did for the <picture> element. It can be done, it just requires some effort.

Update: January 18, 2017

Minutiae

The statement in the title of this post is not new. It has been discussed for at least three years in standards bodies. It has been ignored by browsers for longer (more than seven years), though the browser bugs linked at the end are only a couple years old. Anyone who claims this is a recent change has not confirmed that with the W3C specification in two years.

The following links are just evidence I have needed to provide repeatedly to demonstrate these points. I guess they are more for me to easily reference from future Stack Overflow answers.

In October 2013, Steve Faulkner (an editor of the HTML5 specification) wrote a post, The HTML5 Document Outline, explaining that it does not exist outside of the spec.

To recap, the Document Outline Algorithm was never a recommendation in a final W3C spec. There was a warning explicitly against authors relying on it, though the outline language was retained for browsers to understand how to implement support (eventually).

Regardless of whether you like the idea of the document outline algorithm, it does not reflect reality since no user agent supports it.

Update: January 23, 2018

WHATWG has maintained the fiction of the document outline despite no implementations. An issue against the spec was opened in 2015 to rectify this, where it has languished.

Until today. New conversation has started, with any eye toward accessibility. So there is promise we will either see a more workable proposal for browsers (to ignore?) and/or acknowledgment that the current WHATWG definition needs to be scrapped.

In the interest of generating a solid outline for screen readers, I was wondering if it was okay practice to blend explicit els with role="heading" + aria-level="x", and then hide it for non-screenreaders; e.g.

Main Site Navigation

The idea being to have presentational structures that don’t translate to screen readers without excluding their users from understanding. (also hopefully google doesn’t ding us)

Yet longer answer: If you can already put the appropriate heading level into an aria-level attribute, then you can use the correct <h#>, so this seems like a lot of extra effort for a potentially confusing (and SEO-risky) approach.

Dude, you’re a bozo. Plain and simple. Even if there’s “no such thing as document outline”, just by using a document outline can help tell you if the site is complete shit. You can actually tell by just visiting the site you want to outline. 99% of the time, it completely matches the shittiness that is displayed. Almost as bad as your writing. Almost.

Alf, other than your points about me being a bozo and my poor writing, I am not sure I understand what you are saying. On the first two points I agree, though. So kudos on being right in your opening and close!

Hi Adrian,
I just recently found your blog and the information here is really helpful. Thank you so much.
In short, does it mean that there is always only one H1, which is the title, in a web page?
Chris

Search iconUsed in the search form as a button.Search iconUsed in the search form as a button.Information iconLower-case 'i' in a circle.Checkmark iconSymbol showing a checkmark.Alert iconExclamation mark within a triangle.