Midway through this article I was all ready to grumble about how, by separating “italics used for emphasis” and “italics used for other reasons,” HTML was intruding too much into marking up language itself. We could have a <sentence> tag and a <clause> tag and a <verb> tag, and those would be semantic for some definition of “semantic,” but it would be a huge burden on authors for almost no practical benefit. But I’m convinced; I do buy the idea that using two different tags for italics could be a boon for screen readers. I don’t see a graceful solution within the context of Markdown, though, and that’s become the de facto way to write HTML without writing HTML.

(I’ve actually gotten into the insane habit, while writing Markdown, of using underscores for book titles and other <cite>-type italics and using asterisks for other italicized things. Of course, no Markdown processor I’m aware of treat the two any differently, and this still doesn’t allow me to distinguish between the italics used for emphasizing a single word and those used for, say, marking something that a character is thinking.)

I do have to take strong issue with the author’s advice not to “reset” the italics in the case of nesting. The author is not a native English speaker and so is probably unaware that this is a solid convention in English typography; using bold text instead will be intelligible but it’ll immediately stick out to most English readers.

I don’t think your habit is all that insane. You might want to write a tool (say, a bibliography extractor) that can make finer distinctions than the Markdown spec does, and your local convention would make its parsing much easier and more reliable. Even if you never do that, maybe it’s useful to be able to visually distinguish the different cases in your source text. Programmers do this kind of thing all the time; for example indenting blocks in brace-delimited languages.

I don’t see a graceful solution within the context of Markdown, though, and that’s become the de facto way to write HTML without writing HTML.

I have very recently started the habit of writing HTML instead of Markdown for personal notes / drafts for my blog entries, etc. I guess I am just a person that favours HTML tags over micro-syntax of Markdown, Asciidoc et. al. The possibility of using tags like abbr, dfn, etc. may be a bit like “stamp collecting” in the sense that probably using them adds very little to the document itself, but for some reason I like it very much.Definitily beneficial is, that they make you think more about what you are actually writing at the moment.

Midway through this article I was all ready to grumble about how, by separating “italics used for emphasis” and “italics used for other reasons,” HTML was intruding too much into marking up language itself. We could have a tag and a tag and a tag, and those would be semantic for some definition of “semantic,” but it would be a huge burden on authors for almost no practical benefit.

Isn’t that the main use case for XML these days? Marking up a document so it’s machine-readable for a given context.

Be sure the text in question is not actually more appropriate for another element.
Use <em> to indicate stress emphasis.
Use <strong> to indicate stronger importance.
Use <mark> to indicate relevance.
Use <cite> to mark the name of a work, such as a book, play, or song.
Use <dfn> to mark the defining instance of a term.

That was my thinking as well. If everybody is using a system incorrectly, it’s not the people’s fault. It’s the system. I don’t understand why HTML cares about “stress emphasis”. It seems like a thing that’s particular to English text, and only certain types of English text (long form, literary) at that. Why does the standard care about how something is emphasized and what the various distinctions of emphasis are?

I think it is truly relevant in the context of the semantic web, where we attribute different meaning to tags like <article>, <header>, etc. After all, those could all be simple <div> and the browser would render them the same.

However, in the context of the web as “a place where everyone can put whatever they want and browsers will almost always render it somewhat correctly”, then it might be considered ridiculous. Most people are not writers by trade, and even most writers have editors that take care of these wrinkles to ensure the finished product looks good.

I do agree with the author that everyone should strive to use it correctly, and I think sites that follow these rules along with a focus on a great reading experience are miles ahead of the general population. There is a reason most browsers and devices now have a “Reader”-mode, but they can only go so far in changing the text to convey meaning - in the end the original author must do his part as well.

More generally, any “late-binding” abstraction that isn’t explicitly authored-for and tested-in multiple contexts is at best an abbreviation, not a parameterization. Said another way: Replacing something (code, behavior, data, etc) with a name is not sufficient to enable successful replacement of that code/behavior/data either statically or dynamically. It merely creates an affordance for you to later do so.

Concretely, just using a “semantic” tag or css class name isn’t enough to support alternate semantics. You need to actually consider and test contexts with alternate interpretations. For example, anyone who has ever attempted a theme-able UI knows that you need to have constants for both concrete colors and for particular usage roles and that there will never be enough distinct roles in the latter category. Once you start testing multiple themes, you’ll keep needing more and more new role names to get something that actually looks good. This problem is far worse when behavior changes, not just colors.

Is the root of this issue that “bold” and “italic” settings are the controls you get in virtually all rich text editors, including MS Word and Apple Pages? They give you control over how fonts look, but there’s no controls for “stress”, or “important”. Everyone is trained (and has been for decades, really) to use these controls. How much web content is really written in raw HTML? How could we possibly get to using <em> correctly when:

You don’t get those controls in widely-used (any?) editors

You have to keep “how would this be said by someone who was reading this out loud for someone else?” in mind when writing a document where, mentally, your intention is that it will be read visually.

You are very much trained (whether explicitly, or implicitly by the tools or how the text looks when it’s rendered) to think in terms of “bold” and “italic”.

Using bold and italic is still something people want for visual reasons, possibly separately from giving them spoken meaning.

I don’t disagree with the author, and I learned a lot from the post, I just don’t think this is going to be something that changes. It requires both a tooling and a mindset change for the (very many) people creating written content that goes on the web.

Edit: I do think we might also get clearer written communication if we could get people to think about the different kinds of stress and emphasis, and if we had better ways of differentiating them visually. But given that a lot of this is stuff that native speakers just intuit, rather than thinking about, I think that is unfortunately unlikely. But we’ve all experienced the context loss in email and chat messages, so I guess I can hope.

I’m not sure what I was expecting, but whatever it was, the article was much better than that! Would really recommend people read it.

It reminds me a lot of a language of philosophy class I once took, where one of the exercises was studying how the conveyed meaning of “Brutus is an honorable man” changed as you changed which word was emphasized. For example, emphasizing the “is” conveys that he is still that way, as opposed to “was, but not longer”, while emphasizing the “man” could convey that he’s mortal and not divine.

These days I try to avoid emphasis and prefer to rewrite sentences to “naturally” emphasize certain ideas. I also find that bold is best used for definitions, since it sticks out more in most text.

“We’ve covered how the obvious improvements are all worse, so let’s talk about an obvious regression that’s actually much better: cascade foobazzing.”

I never really thought about it much, but I always have used the em tag in the “correct” way that the author describes. In fact I was probably using the i tag incorrectly for a long time before I learned of em. It never occurred to me that there were legit use cases for italicized text ouside of stressing part of a sentence, other than possibly book titles and blockquotes which I never liked italicizing anyway.

Can technical terms not be emphasized this way? “Today’s post is all about <em>malloc</em> and dynamic memory.” If I were reading that aloud, I might put a little pause around malloc so everybody hears it.

I think <i> would be better for technical terms: <i>malloc</i>. You are pronouncing “malloc” differently, but not because you are stressing it. Your change of voice is similar to the change in voice you might use when quoting a book title, which should also be marked up with <i>: “But what about <i>The Wind in the Willows</i>?”.

Good find with the <dfn> tag, but I’m not sure that <cite> could replace <i> for the book title. From the description of <cite> it sounds like it could, but the HTML demo on that MDN page includes a book title written with <i>, with <cite> wrapped around it, as in “check out <cite><i>The Wind in the Willows</i> by Kenneth Grahame</cite>”. Making the situation more confusing, the page says that the W3C and the WHATWG specs conflict on whether <cite> can include more than the book title.

Good catch. To make things even more confusing, the MDN page also says,

Typically, browsers style the contents of a <cite> element in italics by default.

So maybe it’s best just to stay away from that one, or at least to treat it as a semantic-only tag and handle the formatting yourself. (Some of those example uses would traditionally be formatted in quotes, not italics, so maybe that’s just as well.)