Exploration

Last week, I asked “Should You Hyphenate?” This week, I’m going to assume that you decided to answer in the affirmative and talk about some good practices (I don’t know if they’re best practices just yet). This post was actually triggered by a comment from Kevin Hamilton on last week’s post. He said, in part:

You may want to exclude hyphenation on <code> tags within your blog. For both readability purposes (since many CSS tags already make heavy use of hyphens) and to avoid introducing some confusing/misleading references… Is it re-peating-linear-gradient? Or perhaps repeating-lin-ear-gradient?

He’s absolutely right, of course. If you’re going to blog about technical topics, or even if you’re just writing a style sheet that you expect to release into the wild for use by anyone, there are some elements that you should avoid hyphenating. And since hyphens is an inherited property, it isn’t sufficient to set it for a limited number of elements and assume you’re done. You have to make sure you’ve turned it off for the elements that shouldn’t be hyphenated.

Yes, most of those are old and obscure and in some cases (massively) deprecated, but they’re all elements that could be hanging around on a web site and by their nature shouldn’t have their content hyphenated. I mean, I would hope that a browser would recognize not to hyphenate an acronym or abbreviation element, but who knows? Maybe ZOMGWTFBBQROFLMFAOCOPTER has enough word-like strings to qualify for hyphenation in some hyphenation dictionaries. (Or not.)

“So what about pre?” you ask. A very good question. I rate that as a solid “maybe”. For most uses of pre, the content won’t line-wrap anyway thanks to white-space: pre, so it’s a moot point. However, if a pre has been set to white-space of pre-wrap, pre-line, or even normal, then hyphenation may well kick in.

At that point, the question is what kind of content the pre contains. It apparently is no longer meant to be rigidly preformatted, as the element name would imply, so what is it? If it’s a code block, there should already be a code element present within the pre, so suppressing hyphenation for code will be sufficient. Ditto if it’s an example of user input (kbd), program output (samp), and so on. This is why semantic markup matters. It’s why, if you’ve been using it all along, you can make fine-grained choices here.

Of course, lots of people weren’t as forward-looking as you and anyway nobody’s perfect, so it’s probably a good idea to switch off hyphenation for pre, just in case the more semantic elements were left out.

There are similar questions to confront regarding q and blockquote. If you’re quoting someone, almost certainly something that someone wrote, is it advisable to hyphenate that text when they didn’t? I’m honestly not sure if it matters or not. I’ve personally suppressed hyphenation in those cases, but I did that purely on instinct and I’d love to know what content and typography specialists think of that question. (Be polite, please. We’re all learning here.)

For the last interesting question, what about auto-linked URLs? If we suppress hyphenation for all links, then that solves one problem to introduce another. What I have noticed is that if you drag-select CSS-hyphenated text, the auto-generated hyphen(s) and line break(s) are ignored when you copy the text. You just get the original. That’s why I don’t think it’s really necessary to suppress hyphenation on the a element, though I’m willing to change my mind in the presence of new evidence.

Update 18 Dec 12: I should make it more clear that this post is intended to be a starting point, not the final word. I’m not proposing that these are all the elements on which one should ever suppress hyphenation, full stop, end of discussion. There may well be others, like form labels and textareas and text inputs and so forth, that should also be excluded. (Though I kind of enjoy watching my text input get auto-hyphenated as I type. It’s a little surreal.) Hopefully, this post will get people thinking about exactly how authors should handle hyphenation if they do choose to put it in place, and eventually help us figure out some solid best practices.

Kevin Hamilton wrote in to say...

Hi Eric – I’ve got one more suggestion for you, although I would understand if you decided not to implement it…

@media (min-width: 1500px) {
body { hyphens: none; }
}

I feel that at certain line lengths, the tracking time for my eye to get from the end of the line back to the beginning of the next to complete a word feels disruptive. Of course, this might just be because I’m not used to hyphens on the web or because I am thinking about it too consciously.

But if you try it and you agree, then I think on your site a min-width somewhere between 1200px to 1500px seems like a reasonable cutoff to disable hyphenation.

One thing to note is that hyphens don’t exist in the DOM, and when the user copies something, the hyphens are not copied along (except if you manually insert soft-hyphens). In principle at least (and I’ve never fully tested that…).

Now, I don’t disagree that having e.g code blocks display words with hyphens when none exist might look silly (‘back-ground’, uh). Otoh, if the code element is inline, as in this sentence, and the word contains a hyphen, it will split anyway (e.g. linear-gradient) if it happens to be at the end of the line; hyphen being a line-break opportunity in most browsers.

BTW, Eric, you may want to disable hyphenation in form controls (textarea, input[type=text]).

No, hyphens cannot be opt-out. It has to be opt-in. This philosophical approach does not take into account future elements – each with their own UI requirements. What about selects? What about checkbox labels? Should they all have hyphens? What if there is a new User Interface for Tabs? We cannot be always on our toes to find out which next newly introduced browser feature will break our carefully laid out design.

Hyphens should only be used when the developer is sure there is going to be content and nothing else. Typically that would be a paragraph element, list element or similar.

Phillippe, I kind of like the hyphenation in the textarea. It’s an interesting effect. It’s true that most people probably want to do as you suggest, though. I think the concern with code hyphenation is that something like repeating-linear-gradient could end up displayed as re-peating-linear-gradient and thus be unnecessarily confusing. And imagine the results if extra hyphens strayed into a UNIX command line example! Sure, that would likely be no problem if the user selects-and-copies, as both of us pointed out, but if someone retypes it…yikes.

As long as hyphens is an inherited property, I’m pretty sure there has to be opt-out, Divya. Even if one wrote a selector to hyphenate only paragraphs and list items, there could be phrase element descendants that should be opted out. For that matter, forms often contain paragraphs or are constructed from lists. It’s certainly worthwhile to think about opting out form elements like labels and so forth. That might be a best pratice. I think of my post as a starting point, not the final word; perhaps I need to make that more clear.

Yeah did not think it was. but having body {hyphens: auto;} suggest it as an opt-out.

I would rather choose manually which elements to apply hyphenations on and make it smaller in scope for the dictionaries to operate on. I also think there is value in specifying universal opt-outs but it will be significantly smaller set of selectors

Regarding your question: <em>” If you’re quoting someone, almost certainly something that someone wrote, is it advisable to hyphenate that text when they didn’t?”</em>, here’s some insight you could find useful (or not).

Hyphenating is part of having “editorial etiquette” if you will.

The principle of using hyphenation is so that the text being read is legible in the container is being displayed in. Words ‘break’ and that’s part of their nature and hyphens are the physical evidence of that nature.

Hyphenating needs to happen whenever the words require it, regardless if the words are in someone’s quote, or a product description, etc.

Ricardo Zea beat me to it, but I agree, there shouldn’t be any problems in hyphenating a quote of someone else, especially since you can at times make editorial edits to quotes to get the meaning accross (say substituting “him” with [John]). There is one exeption though: if what you’re quoting has a fixed right margin (or left in a rtl setting), ie. you’re quoting a poem, you can’t introduce hyphenation.

I strongly disagree with Kevin Hamilton’s suggestion of turning hyphenation off for large displays. It seems he’s implying that hyphens are less readable when the line length is long, however the real problem is that the meassure (characters per line) is to high. Instead of adding a quick fix and removing the hyphens when the lines are too long, fix the actual problem instead and prevwnt the lines growing too long. Traditionally 50-70ish has been considered a good meassure, there’s no reason the web shouldn’t conform to the same.

When it comes to hyphenation and languages it is true that each language qould need their own dictionary/ruleset. Each language has different typographic and ortographic rules which, among others, define how and when words can be broken to a new line.

[…] Since hyphens is an inherited property, it isn’t sufficient to set it for a limited number of elements and assume you’re done. You have to make sure you’ve turned it off for the elements that shouldn’t be hyphenated. — Eric Meyer […]

One of biggest wins of hyphenation from my (purely subjective) point of view is that it enables the copy to be fully justified (left and right).

I have never liked having ragged right margins. I find it visually jarring and, to my eyes, it makes the copy look somewhat unfinished.

I know that H&J has traditionally been frowned upon on the web, not least because of the unsightly rivers of white-space that can end up appearing in the text, but with hyphenation and a good rendering engine there’s no reason why it should not be used.

I’ve found today’s WebKit-based browsers handle hyphenated and justified text very well, with no obviously over-large blobs of white-space in the content.

Remember to encode character entities if you're posting markup examples! Management reserves the right to edit or remove any comment—especially those that are abusive, irrelevant to the topic at hand, or made by anonymous posters—although honestly, most edits are a matter of fixing mangled markup. Thus the note about encoding your entities. If you're satisfied with what you've written, then go ahead...