Given HTML’s roots in the academic world, it should be no surprise that quoting is well-accommodated in the elements <blockquote> and <q>, with their optional cite attribute. In addition, there’s the <cite> element, which over the last nine years went from ‘semantic orphan element made good’ to one of the more contentious elements in HTML5. Let’s power up the endoscope and examine the scarring, starting with <blockquote>.

Easy peasy, right? Nothing has really changed. Remember that as <blockquote> is a ‘block-level element’ (flow content) we can put most anything in it, including headers, images and tables, in addition to the usual paragraphs of text.
There are a couple of slight differences in HTML5 though. <blockquote> is a sectioning root, meaning that any <h1>-<h6> elements it contains don’t become part of the document’s outline. Also, adding a single paragraph of text with no enclosing <p> tags is now completely kosher.
Here are some simple <blockquote> examples (apologies for the fake content):

<blockquote>This is a short block quote — look Ma, no paragraph tags!</blockquote>

<blockquote><h1>OMG a heading!</h1>
<ul><li>Block quotes can contain more than just paragraphs…</li></ul>
</blockquote>

OMG a heading!

Block quotes can contain more than just paragraphs…

Historically, adding the source of a <blockquote> was a semantic conundrum. If you add it as content of the <blockquote>, then semantically it would become part of the quote, right? <blockquote> (and <q>) have a cite attribute for the URL of the quote’s source, to provide context. That’s hidden data, however, and despite the potential for exposing the cite attribute via CSS and/or JS, that’s not as useful as a visible link.

Non-conforming means that while it will validate, adding a <footer> that isn’t in the quote’s source goes against the spec2011-07-11 It seems our long-running convention at HTML5 Doctor of using <footer> for attribution inside a <blockquote> is actually non-conforming. However the phrase in the spec that prevents it also prevents other common block quoting patterns, so the spec will probably change. Read my article <blockquote> problems and solutions, and submit feedback via the WHATWG email list, the comments here or to be via Twitter (@boblet) — your feedback will influence how the spec changes!
I’ll update this article after the change, but until then be aware <footer> for attribution in a <blockquote> isn’t strictly valid, and may not be in the future either. The spec currently recommends including attribution in content surrounding the <blockquote>.

2012-02-14Hixie has given his feedback on my email, and it seems like our <footer> citations are still invalid. The official recommendation is to put the blockquote in a figure and add attribution in <figcaption>. Read the whole thread as there are some interesting comments. I’ll wait for the dust to settle a little yet…

HTML5 comes to our rescue with the <footer> element, allowing us to add semantically separate information about the quote. For example:

<blockquote>
<p>You know the golden rule, don’t you boy? Those who have the gold make the rules.</p>
<footer>— Crazy hunch-backed old guy from the movie Aladdin</footer>
</blockquote>

You know the golden rule, don’t you boy? Those who have the gold make the rules.
— Crazy hunch-backed old guy from the movie Aladdin

Because of this semantically sound way to show the quote’s source, if you’re going to add a cite attribute on <blockquote>, only do so in addition to visible attribution.

<blockquote cite="http://www.imdb.com/character/ch0000672/quotes">
<p>You know the golden rule, don’t you boy? Those who have the gold make the rules.</p>
<footer>— <a href="http://www.imdb.com/character/ch0000672/quotes">Crazy hunch-backed old guy in Aladdin</a></footer>
</blockquote>

This means we can’t use <q> for sarcasm or other non-quotation uses of quote marks (“”). In those cases, add punctuation manually.
The spec continues:

Quotation punctuation (such as quotation marks) that is quoting the contents of the element must not appear immediately before, after, or inside q elements; they will be inserted into the rendering by the user agent.
— W3C HTML5 specification

As with <blockquote>, you can also add a cite attribute with a URL for the quotation’s source (subject to the above caveats against hidden data). If you’re not using these extra features though, it’s a toss-up as to whether <q> is any better than just adding punctuation characters like “” as you type.
Okay, let’s see some specimens:
Nested quotations:

<p>Luke continued, <q>And then she called him a <q>scruffy-looking nerf-herder</q>! I think I’ve got a chance!</q> The poor naive fool…</p>

Luke continued, And then she called him a scruffy-looking nerf-herder! I think I’ve got a chance! The poor naive fool…

A quotation using the cite attribute. Note that I’ve also included the cite attribute’s link in content so it’s accessible:

<p><a href="http://www.imdb.com/character/ch0000672/quotes">The Aladdin character Jafar</a> presents an eloquent treatise on the recent global economic meltdown when he states <q cite="http://www.imdb.com/character/ch0000672/quotes">You know the golden rule, don’t you boy? Those who have the gold make the rules.</q></p>

The Aladdin character Jafar presents an eloquent treatise on the recent global economic meltdown when he states You know the golden rule, don’t you boy? Those who have the gold make the rules.Let’s examine how to style these elements next.

Historically, browser support has been patchy for controlling the punctuation used by <q>. Things have settled down now, so we can define nested, language-specific and even author-defined punctuation via CSS.

If you’re using the charset UTF-8 (and you should be), we recommend you use the actual characters if possible, rather than the Unicode escapes in CSS or the entities in HTML. You can enter most of these using the keyboard — e.g. “ is Opt-[ on Mac, Alt + 0147 on Windows, and AltGr + V on Linux. Avoid using ", ' or ` in place of “” and ‘’. The “narrow no-break space” is used inside French guillemets.

Most languages alternate between two kinds of punctuation as quotes are nested, such as “” and ‘’ in English. To specify nested quote pairs in CSS, we would write this:

Unfortunately, browsers use the last quote pair in the quotes property for more deeply nested quotations. In addition, Opera will use the wrong quote characters if you have more nested <q> than your quotes property defines quoting levels for (Opera quotes bug test case). Make sure you have enough levels by repeating quote pairs as necessary:

WebKit had "" and '' hard-coded in the browser stylesheet until Safari 5.1 and Chrome 11, which prevented q:before {content: open-quote;} and q:after {content: close-quote;} from working. The workaround is to define opening and closing punctuation manually, then override with open-quote and close-quote. While it’s a little more involved, that’s why we use this CSS on HTML5 Doctor:

A more traditional English <blockquote> style uses an opening quote character before each paragraph of the quotation and a closing quote character on the last paragraph. You can do that with this CSS, but you’ll need to use <p> for the <blockquote>’s content.

As <CITE>Harry S. Truman</CITE> said…
More information can be found in <CITE>[ISO-0000]</CITE>

Sadly, an example of an academic-style citation wasn’t included.
Some standardistas enthusiastically adopted <cite> for its semantics, with the high point being Mark Pilgrim’s epic “Posts by citation” (the results of which are now sadly 404’ed). In those heady days, <cite> was used in three main ways:

To refer to a person, generally in connection with some reference or as the source of a quote:

“Wow, what an all-rounder!” I hear you say. “Is there anything <cite> can’t do?” The dirty secret of all this is the <cite> element has historically been semantics for the sake of semantics. So far, the only non-site-specific application of <cite> is browser default stylesheets, which format it with font-style: italic;. This is not a bad thing, as using <cite> consistently on your own site allows you to do all kinds of fun stuff (as Pilgrim demonstrated). But in the past, it’s been used to refer to three related but quite different types of data: titles, full citations, and names. This makes web-wide use, such as by a search engine, tricky.

So, in HTML5 this semantic over-achiever has ended up with a more … prosaic definition:

The cite element represents the title of a work (e.g. a book, a paper, an essay, a poem, a score, a song, a script, a film, a TV show, a game, a sculpture, a painting, a theatre production, a play, an opera, a musical, an exhibition, a legal case report, etc). This can be a work that is being quoted or referenced in detail (i.e. a citation), or it can just be a work that is mentioned in passing.
A person's name is not the title of a work — even if people call that person a piece of work — and the element must therefore not be used to mark up people's names— W3C HTML5 specification

A piece of work, heh, that’s goodThis restriction has been somewhat … unpopular. Arguments for using <cite> for names (now summarised on the WHATWG wiki) were addressed by Ian Hickson, who decided that historical use wasn’t enough to justify the wooly definition. Jeremy Keith’s 24 Ways article “Incite A Riot” called for civil disobedience and HTML 4.01-style <cite>-ing, but the HTML5 spec has not changed.
The in<cite>rs are irate that there are two use cases that <cite>’s new definition leaves semantically unfilled — to mark up speakers in a transcript or dialog, and to indicate the speaker or author of an inline quote (<q>). The HTML5 spec adds semantic insult to injury by saying:

In some cases, the <b> element might be appropriate for names; e.g. in a gossip article … In other cases, if an element is really needed, the <span> element can be used.

By better defining <cite>, we increase the odds of getting usable data from it, though we now need different methods to cover these other uses. For now, it seems that these use cases aren’t specific enough to warrant new elements.
Note that <cite> was never a general-purpose element for marking up a person. The still-born HTML 3.0 did try to introduce the <person> element, but if you’ve ever used hCard to semantically mark up a person’s name, you’ll know that we’d need way more than just one element to do names justice. The POSH way of marking up a name is to use hCard (in microformats, microdata or RDFa), or just with a plain old link.

A Game of Thrones, by GeorgeR. R.MartinIn this example, the author and book title are only connected by proximity. You could connect them more explicitly using the hProduct microformat, RDFa’s GoodRelations, or to really bleed on the edge even Schema.org.
Note that you can’t use the now-Google-approved rel="author" attribute here, as George R. R. Martin is being referred to and isn’t writing the article. If you just wanted to style the author’s name, you could use <b class="author"> (gossip column style) or <span class="author"> with whatever CSS you like.

Oay, let’s start mixing things up on the operating table and show some examples of <cite> with <blockquote> and <q>:
A movie <blockquote> with <cite>:

<blockquote>
<p>You know the golden rule, don’t you boy? Those who have the gold make the rules.</p>
<footer>— Crazy hunch-backed old guy in <cite><a href="http://en.wikipedia.org/wiki/Aladdin_(1992_Disney_film)">Aladdin</a></cite></footer>
</blockquote>

You know the golden rule, don’t you boy? Those who have the gold make the rules.
— Crazy hunch-backed old guy in Aladdin

Adding the cite attribute to a <blockquote> (and its <footer>):

<blockquote cite="http://www.imdb.com/character/ch0000672/quotes">
<p>You know the golden rule, don’t you boy? Those who have the gold make the rules.</p>
<footer>— <a href="http://www.imdb.com/character/ch0000672/quotes">Crazy hunch-backed old guy</a> in <cite><a href="http://en.wikipedia.org/wiki/Aladdin_(1992_Disney_film)">Aladdin</a></cite></footer>
</blockquote>

<p>I wonder if feedback on <code>&lt;cite&gt;</code> prompted this:</p>
<blockquote><p>A person's name is not the title of a work — even if people call that person a piece of work</p>
<footer><cite><a href="http://developers.whatwg.org/text-level-semantics.html#the-cite-element">HTML5 for Web Developers</a></cite></footer>
</blockquote>

The information capacity of the human motor system in controlling the amplitude of movement, Paul M. Fitts (1954). Journal of Experimental Psychology, volume 47, number 6, June 1954, pp. 381–391

An academic-style book citation:

<blockquote>
<p>Citations … all include the following: author (or editor, compiler, or translator standing in place of the author), title (and usually subtitle), and date of publication.</p>
<footer><cite><a href="http://www.chicagomanualofstyle.org/">The Chicago Manual of Style</a></cite>, 15th Edition (Chicago: University of Chicago Press, 2003), 596</footer>
</blockquote>

Citations … all include the following: author (or editor, compiler, or translator standing in place of the author), title (and usually subtitle), and date of publication.
— The Chicago Manual of Style, 15th Edition (Chicago: University of Chicago Press, 2003), 596

If you’ve made it this far, congratulations! You’ve now learned more about citing and quoting in HTML5 than you wanted to know ;) But don’t keep the knowledge to yourself — let us know in the comments what you think. We’d also love to hear how you’re using <blockquote>, <q>, and <cite> in HTML5. If you share your code snippets, remember to escape them!

2011-06-29: It seems our long-running convention at HTML5 Doctor of using <footer> for attribution inside a <blockquote> is in keeping with the <footer> part of the spec, but not with the <blockquote> part. We’re investigating…

2011-07-03: Hixie confirmed that our use of <footer> is currently non-conforming — <footer> can currently only be included in <blockquote> if it’s quoted content. However, the phrase “content inside a blockquote must be quoted from another source” also forbids other common changes and additions to block quotes, so I’m going to see if it can be changed.

Great article. One thing though: the second example in the list of historical cite uses seems a bit odd. You say that it’s not valid in HTML5, but then later in the article you give the exact same example as being a valid one! Having read the example again and again, I’m pretty convinced it is a valid one…

I just noticed some cleaver use of scopedstyle-elements in the article. Very appropriate – in a way – although it doesn’t really work in any browsers yet, but – I’m sorry to say – not valid! At least that’s what the W3C validator says (among other things).

Glad to see a solution to this. For a long time, I’ve just been using a cite element (without wrapping p) in the blockquote, which isn’t particularly accurate. I can update ReMarkable to use this method instead.

For some personal reflection on practical use of the abbr, dfn and cite elements (which all quickly fall into the “semantics for the sake of semantics” problem you describe), see my article Me, Myself and I — or: Abbreviations, Definitions & Citations Revisited” http://camendesign.com/abbr_redux

@Zev & Bertil — I must be slow today, have edited that example six times so far >_>

@Bertil — I think <style scoped> is in WebKit nightlies now (if not close)

@Bertil & Kroc — see the addition at the top of the article. I need to email the list about it as IRC feedback was a little divided. There’s a decent argument for this pattern, so now I have to make it :) Also, good article Kroc. I read it while writing this.

The suggestions in my article came directly from writing and editing a few megs worth of raw text used on my website, which bought up lots of edge-cases and curious questions about semantics; so whilst I wouldn’t say my choices would suit everybody, they have at least been trialled in a background of text.

My complaint about the ABBR article you published here on HTML5Doctor was essentially that you weren’t following your own advice, as I know that I practically went insane trying to use those rules on megs of text before I came up with my own to take back control of my sanity.

But, I will definitely say that cite still remains the weaker out of the three and I appreciate this article for being far more square.

If you would like, my article could be further adapted withfeedback from the doctors to better suit a broader audience. I strongly believe that a key part of learning HTML5 is learning HTML4 *properly* and eschewing spans and divs for semantics where possible

because HTML5 “forbids” us (which I have never respected) to use the cite for author, how do we declare the author of something. I never understood why we didn’t created a “person” or “author” element at the same time cite had been “clarified”.

@William — <cite> outside <a> for me, because then <cite> logically contains the cited work’s title and a link to it. This’d give you better info if you e.g. make a script scraping for <cite>d works. However Charles’ take on it is also good.

While I applaud the noble efforts to make the web more semantic I am sorry to break the bad news: only standaristas who blog will ever correctly use the blockquote, q and cite elements. The rest of the world has trouble with the correct usage of even the most basic of HTML elements.

And even then the standarista community can’t agree what is the correct usage and what’s not. This is similar to the header and footer elements that are so generic people will use them for all kinds of purposes, thus obliterating any real-world usage (e.g. when writing screen-reading software)

Karl, the problem is that if the markup pattern is not widely used, writing software that depends on it is useless. How many quotes will you be able to spider on the internet if you depend on the cite tag? Not many. All of your examples rely on correct usage.

Of course this is a chicken and egg problem. If HTML would be taught properly in schools in the future etc. etc.

@Wolf — I agree with you that the problem is education, so I’m puzzled by your cynicism in your comments above. That’s what we’re trying to do on HTML5 Doctor, after all! :) Also, you missed two very large and important groups: software (e.g. DreamWeaver) and tool makers (e.g. editing toolbars in CMSs), and service providers (e.g. academic journal tools). I try to follow the advice a wise sage once gave me, just do the best you can. In some small way I hope people reading this article can benefit from it!

@All — I’ve made some changes to the article regarding the ongoing <footer> in <blockquote> saga. It looks like the spec will change, so please chime in with your feedback and ideas!

Cite is simply one of the most problematic tags in HTML5 at the moment. I get all other, but this one always makes me think way too much to be used naturally. I’m currently working out the system to semantically markup works cited in a few of my articles, but it seems so damn complicated with all those changes and “cite wars” … duh.

Also I notices recently that wikipedia is using normal ol/li for their references as they often just link to sources without naming them.

Oli,
Just grabbed the styling you’ve placed above and had all sorts of issues with special characters. Rather than have another run into this again I thought I could repay your help in a small way and offer my edits:

@Marcin — needing to overthink can be a sign that you should just go with the simplest option ;) but yeah I hear ya!

@Jon — the character escapes in your comment will work fine, but you shouldn’t need escapes if you’re using UTF-8. Check your page’s encoding and confirm the browser is getting UTF-8. Check your text editor is saving UTF-8. Finally I always use:

@charset "utf-8";

as the first line of my CSS files. If you have a 100% UTF-8 workflow you should only need character escapes for characters with special meaning in CSS, which Mathias Bynens covers in CSS character escape sequences.

Hixie has given his feedback on my email, and it seems like our <footer> citations are still invalid. The official recommendation is to put the blockquote in a figure and add attribution in <figcaption>. Read the whole thread as there are some interesting comments.

I’ll wait for the dust to settle a little before updating this article (and possibly every spec quote on HTML5 Doctor :| )

Oli,
I’m a bit confused as I’m using the H5BP template framework and the CSS file included? I can see the meta charset is set to utf-8 there and in both the web.config and .htaccess? However, this comment is in the web.config, should this be changed?
<!-- use utf-8 encoding for anything served text/plain or text/html -->
<remove fileExtension=".css"/>
<mimeMap fileExtension=".css" mimeType="text/css"/>

@Jon — The settings in your text editor and .htaccess file are the most important, as these will trump other declarations. Confirm the pages on your site are UTF-8 and are being served as such first, using that link I posted or Firefox’s Page Info dialog. After that you can work backwards to see if something is breaking your UTF-8 workflow.

That academic-style journal citation is all kinds of trouble. How are you supposed to distinguish between:
— <cite> for the name of the article being cited (which should—in my reference style of choice—be recte and “in double quotes”);
— <cite> for book names (which should be italicised and not quoted);
— <cite> for journals (of which only the journal name itself should be italicised and quoted, with issue number, etc., being recte and not quoted); and
— <cite> for monograph series (which should be recte and not quoted)?

What a jungle!

In your example here, only the name of the journal is <cite>d. That means the article referred to is not marked up as a cited work at all; and also that identifying information such as volume number (which is necessary to identify the source) is put outside the citation.

If HTML5 wants to cater to academic citations, it needs to get its working hat on and add something like a type attribute (<cite type="article/monograph/journal/series/etc.">) that we can style our CSS by. And it needs to come up with an <author> and/or <source> tag, too. :-]

@Bertil:

But I’m not sure about your ”nyan”-example:

[…]

Is that really a proper use of the q-element? What’s the source of the “quotation”? The Japanese language?

(Ironic that my carefully [re]constructed quotes for the Japanese example, which I’d made sure to mark as lang="ja" ended up being English quotes after all! Is lang stripped/not allowed from comments?)

That means the article referred to is not marked up as a cited work at all; and also that identifying information such as volume number (which is necessary to identify the source) is put outside the citation.

What would you expect to happen if this information was marked up? What is the use case, and what benefits would you get?

If HTML5 wants to cater to academic citations

I suspect academic citations of the type you’d like would be outside HTML5’s scope, but they can be addressed with microdata or RDFa.

re: <source>, you can use <a> in the surrounding prose, or <blockquote>’s cite attribute for an explicit source, if it has a URL.

Regarding that accursed cat example, I think I’ve corrected the mistake you spotted and fixed your comment (let me know if not), and will add lang to the allowed attributes list. Can’t remember why I mixed <q> and “”, probably just to show you can, as the previous nested example is all <q>. damn uuu neko-chan! :)

The class attribute for styling, microdata or RDFa for semantics.
[…]
What would you expect to happen if this information was marked up? What is the use case, and what benefits would you get?

Well, what’s the purpose of marking something up with cite to begin with? Styling is one thing, of course, but semantically, in the given example, what’s being cited is the article, not the book. The book is what you need if you want to read the article.

I guess the use case would mostly be for something like extracting HTML5 markup to XML or something similar, where only the citations themselves would be kept. I admit, it’s a bit far-fetched, and probably outside the scope of HTML5. But for scholarly publishing online, it would be great if it were there.

If ever I do need to venture in that direction, it looks like I’ll have some reading to do, catching up (read: learning the basics) on RDFa, hCard microformats, microcards, etc.—all technologies that I am woefully ignorant of.

(As for poor 猫ちゃん, the mistake I spotted was simply that the word[s] “日本語に” was/were missing in the kanji version of the examples, but present in the rōmaji and English versions.)

Indeed :) There are things we could do with formatted academic citations, but currently I’m unaware of anyone actually doing anything, which makes <cite> merely a semantic and medium-independent way of indicating the work being cited. Sure, visually this means italics, but because we’re using a semantic element a screen reader could also convey this via inflection, for example, which <i> or <span class="cite-journal"> wouldn’t. Once people start actually doing stuff (= real world use cases), there’ll be some cowpaths for WHATWG to consider paving.

I don’t get it. Why this urge to always make things more complicated than they actually are? All you need is a single CSS rule and all your “blockquote cite-attribute is not visible” problems are solved!

Palawan
“Think of secret lagoons, unexplored coves, sparkling turquoise waters, fine white sand, spectacular limestone karst, fresh seafood and lovely people - these things are just a fraction of what you can experience in El Nido. Coron is a one of the best diving destination in the world. They have the cleanest lake and some amazing landscape and seascape. Puerto Princesa City on the other hand offers one of the New 7 Wonders of the World. The Underground River tour is just out of this world.”

Somewhat off topic, or maybe not. I’ve seen it argued that the citation should be placed *outside* the <blockquote> inasmuch as the citation isn’t strictly part of the material being quoted. Fussy semantics.

This is probably a dumb question, and forgive me if it’s been asked in the comments already (I figure the difficulty of ignoring me is a lot less than the difficulty of reading the entire comment thread), but why not suggest that the HTML5 spec define <cite> to be use case 3 rather than use case 2, with the subparts of the name of the work and the name of the author each optional but another pair of tags for them? E.g. >cite<>hypothetical-author-tag<John Doe>/hypothetical-author-tag<, >hypothetical-title-tag<Reconciling Purity and Practicality>/hypothetical-title-tag<>/cite< or >q<…you can fool some of the people all of the time…>/q<>cite<gt;hypothetical-author-tag<Abe Lincoln>/hypothetical-author-tag<>/cite< or >cite<>hypothetical-title-tag<An Example of the Original Purpose of the Cite Tag>/hypothetical-title-tag<>/cite< Would something like this not satisfy both the purist specifiers who object to cite being used for three different things and the practical authors who object to effectively being told that citations other than the name of a work are semantically irrelevant? Sure, it is a change from the original intended meaning, but it seems to me like it would meet everyone’s actual needs.

Oops, not used to writing < and > as &lt; and &ampgt;, managed to get most of them backwards… let’s try again…

<cite><hypothetical-author-tag>John Doe</hypothetical-author-tag>, <hypothetical-title-tag>Reconciling Purity and Practicality</hypothetical-title-tag></cite> or <q>…you can fool some of the people all of the time…</q><cite>lt;hypothetical-author-tag>Abe Lincoln</hypothetical-author-tag></cite> or <cite><hypothetical-title-tag>An Example of the Original Purpose of the Cite Tag</hypothetical-title-tag></cite>

Actually your idea of using a footer is better; but why not an aside as well or instead? To much clutter/confusion?

Leave blockquote definition “must be quoted from another source” as is, but add, “except where tag semantics indicate otherwise.”

The clutter/confusion will inevitably be expressed somewhere.
The current spec just means; not in blockquote! ‘Cause its saying if your going to use blockquote so it looks nice and gives meta-information you want to enclose it in a figure tag, and use figcaption for the stuff about blockquote.
Basically the spec is inventing a compound tag with this suggestion, which will lead to clutter/confusion, and questions like; why can’t I put all this stuff in blockquote?

When a quotation is inside an article, it is reasonable to use figure as the outer wrapper and figcaption as the attribution wrapper.

But how about a list of quotations which are not inside an article? They don’t serve the purpose of illustrating/annotating the context (because there is no context), so figure and figcaption can’t be the wrappers. How do we mark up a list of quotations in such a case?

I agree with Matěj Cepl about the academic citation. “The information capacity of the human motor system in controlling the amplitude of movement” should be in <cite>, since that is the work being cited. You wouldn’t, in general, cite an entire journal. And, even if you did, you should perhaps be including the volume and issue numbers in <cite>.

I don’t know if you noticed or not, but as of this writing, the ‘blockquote + figcaption inside figure’ style is given as an example on the HTML5 blockquote spec page. Guess this is the best option we’ve got for now.

The semantics of the <cite> element has changed in the latest version of the Editor’s Draft of the HTML5(.1) specification. Now it says

The cite element represents a reference to a creative work. It must include the title of the work or the name of the author(person, people or organization) or an URL reference, which may be in an abbreviated form as per the conventions used for the addition of citation metadata.

and the following example is given:

<p>In the words of <cite>Charles Bukowski</cite> -
<q>An intellectual says a simple thing in a hard way. An artist says a hard thing in a simple way.</q></p>

Hence, now it is again allowed to use <cite> to mark up names, at least when referring to an author of a ‘creative work’.

Why has this been changed? Will the change survive? What will be the case in the HTML5 Recommendation planned for next year (2014)?

I think I prefer the old meaning of the <cite> element (the one only permitting titles). The reason is that I am really fond of the exactness, cleanness, simplicity, and ease-of-use that version brought to the HTML5 specification.

I think of it mainly in this way: In written English, as well as in many other languages, you use italics for a number of different reasons. In hypertext documents, you can embed information about the precise reason why a phrase is in italics:

* <em&gt for stress emphasis, possibly the most common case. Often alters (or clarifies) the meaning of the text.
* <cite&gt for titles (books, articles, films, …).
* <dfn&gt for the defining instance of a term.
* <i&gt (with class attribute) for phrases in languages different from the language of the surrounding text.
* <i&gt (with class attribute) for Latin names of species.
* a few other cases.

I think this is wonderful.

In this case, it is extremely easy to use <cite> right — it’s used for titles, nothing else. It also makes perfect sense to have a specific element for this purpose, since you usually want to display titles in a different way visually (typically in italics), so you certainly need some markup here. And why not make it semantic, and make the wonderful list above come true?

As noted in the article (to which this is a comment), HTML 2.0 gives exactly this meaning to the <cite> element. Unfortunately, the text in other HTML specifications and proposals is more diffuse: In HTML+, an academic citation is given as the only example. In HTML 3.0, the text talks about ‘citations’ and ‘italics’, and the only example is a book title. In HTML 3.2, ‘citations or references’ is used to describe the element. The same applies to HTML 4.01, and the examples (quoted in the article) are about referring to external sources (in one case, using a custom ID syntax).

Personally, I feel that the use of <cite> in “my” way (the HTML 2.0 and the original HTML5 way) is semantically different from the one about giving ‘references’ to external sources. (It also happens to be two different things typographically: titles are usually in italics, while ‘references’ might be given in various formats.) So I don’t really like the idea to allow both of these uses: it would make the element, essentially, meaningless from a semantic point of view. (And even if we ignore the part about semantics, it would not be painless to use the element in actual web pages when it comes to the default formatting, since book titles should be in italics while names of people shouldn’t, generally.)

Marking up titles is important, generally. For instance, the text

“Violence is good.”

doesn’t mean the same thing as

“<cite>Violence</cite> (=a book or an article, perhaps?) is good.”

It’s a feature of written human language, essentially, that titles should be marked up in some way, in the same vein as stress emphasis and defining instances. Giving references in an academic paper, or referring to a person or URL, doesn’t feel like the same thing (and generally requires different formatting). At least not exactly.

Additionally, the current HTML5 text apparently considers it to be a ‘creative work’ if a person mutters something to himself.

I guess my main points are the following:

The restrictive version of the definition

* is much easier to apply right;
* has a very precise semantics; and
* is very useful in practice, since you do need some markup to affect the formatting.

The new version of the definition

* is used for many different cases, and it is not as easy to tell if a borderline application is appropriate or not;
* can mean a number of similar, but not identical, things; and
* can be useful in practice, but you might need different classes to make only some of the instances be rendered in italics.

Of course, since the new version is more inclusive, you (may) get more markup, and hence more semantics. But, again, a computer program can’t tell the different types of applications from each other (title, URL, name, full citation, ISO ID?). Well, they are all references, in some sense, so I certainly agree you do gain something, but we will “never” be able to mark up everything anyway. Do we really need an element for marking up every possible ‘citation’ or ‘reference’ in a broad sense? Well, certainly not for class-less formatting, at least.

I think I prefer the old meaning of the element (the one only permitting titles). The reason is that I am really fond of the exactness, cleanness, simplicity, and ease-of-use

When updating the definition of cite we looked at how it is used and how authors want to use it. I understand the allure of a definition that restricts cite to titles of works, but it has not and is not being used for this purpose in the majority of instances. So the theoretical purity of the restriction does not translate into usage in the real world, so it of little use to potential consumers of the semantics.

you usually want to display titles in a different way visually (typically in italics), so you certainly need some markup here.

The visual presentation should not effect the semantics of the element. if you look at how search engines such as Google (used for URLs in search results not italicized) or Bing (used for URLs in search results not italicized) use cite, they override the default visual presentation. Does this mean its no longer a citation?

* is used for many different cases, and it is not as easy to tell if a borderline application is appropriate or not;
* can mean a number of similar, but not identical, things; and
* can be useful in practice, but you might need different classes to make only some of the instances be rendered in italics.

The information that reference consists of is broader, but its broadness means that it actually reflects usage in the real world.
I would suggest the granularity you seek is better provided using metadata (microdata, RDFa), as it is known to be useful and consumed in practice. Also the use of classes to provide such granularity is fine and encouraged if its useful for the author:

Authors can use the class attribute to extend elements, effectively creating their own elements, while using the most applicable existing “real” HTML element, so that browsers and other tools that don’t know of the extension can still support it somewhat well. This is the tack used by microformats, for example.HTML 5.1 – 2.2.3 Extensibility

I do understand your arguments and they certainly make sense. I suppose, to some extent, this it is a choice one has to make: either you make the standard theoretically simple, beautiful, and logical, or you make the best you can without ‘breaking’ millions of existing hypertext documents.

If I understand this correctly (yes, for some reason I do find the new text harder to understand), the new version is strictly a ‘superset’ of the old one, in the sense that every valid use of <cite> according to the last version is also valid according to the new version? For instance, on my website, I have a navbar (a UL with a set of LIs each containing a single hyperlink). One of the links is <cite>Ändlös längtan</cite>; this link will take you to the homepage of my (Swedish-language) book Ändlös längtan. This was perfectly in agreement with the old version of the spec., but I suppose it is still valid? Although the navbar doesn’t contain a quote from the book, this LI certainly is “a reference to a creative work [that, in addition,] include[s] the title of the work”.

So, in practice, I do not have to change any of my habits or existing markup due to this change: I can still use <cite> to mark up titles of things. I don’t have to use it to mark up names next to quotes, etc. (although I certainly could start doing that now).

According to that text, my habit of marking up titles would be wrong. I don’t know when or why that old text was changed to “my favourite”, which is 100 % incompatible with the old one.

In terms of valid types of applications, it seems like the new version (maybe we should call it “the third one”?) includes both the old one (the one from 2008: “the source, or reference, for a quote or statement made in the document”) and my favourite (titles of stuff), as well as other kinds of ‘references’ to ‘creative works’ (including sounds someone mutters to himself). Is it correct to say that the new version is the most inclusive one ever in an HTML specification?

(yes, for some reason I do find the new text harder to understand), the new version is strictly a ‘superset’ of the old one, in the sense that every valid use of according to the last version is also valid according to the new version?

Correct. If the text is unclear please do file a bug on the HTML spec, it’s an editors draft and there for people to comment on and help improve.

So, in practice, I do not have to change any of my habits or existing markup due to this change: I can still use to mark up titles of things. I don’t have to use it to mark up names next to quotes, etc. (although I certainly could start doing that now).

correct.

There is an example in the spec:

<p>Who is your favorite doctor (in <cite>Doctor Who</cite>)?</p>

Is it correct to say that the new version is the most inclusive one ever in an HTML specification?

Well, I would like to think it’s inclusiveness is better defined and explained than previous versions which were somewhat musrky.

Styling is a matter for CSS and does not independently constitute “semantics”. However, it is advisable to style HTML elements according to their semantic differentiation within the document. Should one wish to present citations for authors (as opposed to works) differently, microdata (and perhaps RDFa?) has itemprop=author. This is equivalent to the rel=author link relation but not restricted to hyperlinks.

The CSS attribute selector is as follows:

[itemprop="author"] {}

Or, for citations that include links to author pages:cite [rel="author"] {}

As Steve demonstrates, uses of the cite element are many and varied. This is in accordance with its broad remit in the English Language, which includes works and authors.

<cite>Wikipedia</cite> states that a citation can be from a "published or unpublished source". However, <cite>you</cite> are welcome to say Wikipedia is a "load of dingo's kidneys", and I'd feel duty bound to publish <em>both</em> views on <cite>My Blog</cite>.

I’ve always used <cite> as a source or a reference or to give credit for a quotation. Never simply to mark up a title unless it was a source. So to me this, “<cite> Aladdin</cite> is a great movie, even after 73 viewings. Aren’t kids great?,” makes no sense. If <cite> was strictly for titles, why not use <title> instead? Isn’t that more semantic? Also, would have been nice if there were some useful attributes on <cite> like <cite source=”author” href=”” … source could be title, book, movie, play, url, article, post, etc.

@Rich – Why is it “surely” unnecessary? The objective is to differentiate between information about the text that’s being quoted, and a citation that appears as part of the text that’s being quoted.

Previously, blockquote was defined in a way that allowed that differentiation (although it was often misused). HTML 5.1 changes the blockquote definition in such a way that it is no longer even possible.

@Steve – No I don’t. It’s just a principle. My guess is that it’s pretty rare.

I guess I don’t really understand what the problem is that the change is trying to solve. If it’s just for the sake of paving a cow path then I doubt its utility, but I’m not particularly opposed to it.

I suspect (again, no data) that the blockquote indentation rule tends to encourage authors to put the cite inside the blockquote, so I wonder whether an HTML5 endorsed way of permitting that might be sufficient to effect a change in authoring practice. Using the footer element inside the blockquote to contain metadata about the quote, including its citation seems both natural and pretty harmless since there should be no need to use footer in the quote itself. Accordingly, my preference would be that the blockquote change was limited to permitting that.

Regarding the use of a reference inside a blockquote element, I think contextual information should be outside, preferably as part of the part of the introduction, such as in ‘Regarding the blockquote element, the W3C HTML5 specification states:’ before the blockquote for your first example.

I think this context-after approach is part of the overall inside-out approach to information presentation that has been inherited from print, when it consisted of large tracts of text, interspersed with pictures and tables placed in convenient places for layout, but not necessarily near the text to which it relates, and often not even on the same page.

To me, the more sensible way of presenting figures, tables and lists is to introduce them, giving its context, and optionally what the reader should look for in it. This is because if the figure or table is a part of the narrative, then it should be presented right at the appropriate context point.

It would also keep the reader focussed on why the figure or table is being presented. A picture may be worth a thousand words, but unless you guide a reader in what to look for, they may well pick a thousand words that don’t fit with your purpose for the element.

Also, people don’t want to waste their time, so they make very quick decisions about whether what is being presented needs to be read/viewed or not, so introducing its context helps them to make that decision.

Some examples of introductions are:
– ‘A young red pine, with visible roots due to soil erosion:’
– ‘To identify yourself, bring any two of:’
– ‘The damage to the building, as viewed from the south-east corner, is:’
– ‘The sales for last five years, with year-on-year downturns shown in red, are:’.

I know we have been trained to look first, then read the caption, but I think introducing context BEFORE a mass of visual information serves readers better. The reader will then be more aware of the context that text after the element is focused upon.

The other consequence of the more granular ‘chunkification’ of information in web content these days, is that there is no reason why a figure or table number is required, when there is a heading closely above it. How many web sites provide a table of figures or tables to use that number?