native wrapper: that rb only belongs in the ruby format, allows tools to auto-insert it together with the parent ruby element, something which is often practical compared with manually adding a wrapper as an afterthought. For contrast, the current HTML5 alternative will never offer the same feasibility, since:

the alternative wrapper (e.g. span) is not part of the format, and thus one would have to manually add it (for a contrast, when the editor oXygen inserts the ruby element in an XHTML 1.1. document, then the rb element gets automatically added so that the author can start to type inside it, much like many editors, when inserting a dl or a table, also inserts the required children elements);

when deleting (with a browser based WYSIWYG tool) the content of a non-native wrapper, the wrapper itself will often be deleted too (this in order to prevent littering the code with stray, empty elements — XStandard does this, see the heading “Emtpy Tags”), whereas an empty element that is part of the format itself, could just remain in the code, ready to be refilled with content.)

CSS2 selector: the rb element makes a cross browser CSS2 selector (rb{}) that, as well — and unlike e.g. span{} — only selects ruby base text. In order to offer a similar feature without inclusion of rb, HTML5's editor has proposed extending the old and largely unimplementedCSS3 Generated and Replaced Content Module with, quote “a pseudo-element that can style certain spans of descendants); the flip side of ::outside. His proposed “flip side of ::outside” does, however (according to CanIuse.com) have zero implementation. And that the editor as well questions the need to select the ruby base text at all, doesn't make this option any more credible. Note: For simple ruby, it is not uncommon (examples: one, two (two a)) to use ruby{display:inline-table} in combinatino with rb{display:table-row-group; /* or similar, from CSS tables */}. And it might be that removing the rb makes the styling less robust, but other than that, it seems to work also without rb

CSS2 backup styling: HTML5 prescribes the style rules ruby{display:ruby;}rt{display:ruby-text;}, which stem from the CSS3 ruby module, but which none of the common browsers fully support yet (no, not even Webkit or IE, despite that they have some ruby support). Thus, if one wants it to look ruby in Opera and Firefox, then one must hack up som backup CSS that works, and a wrapper element for the ruby base text may then come in handy — even if it is not impossible to make it work without it: demo of styling that works in Opera + Firefox.

HTML5's break from the source order found in XHTML 1.1, creates problems for every parser/reader that needs to detect words more that visually. Problem examples: A user trying a find-in-page search in the browser for 'WWW' when the above code is used, will not locate the 'WWW' that the user can see. For the same reason, it creates problems in screen readers, in online translations services like Google Translate, in copy-and-paste and selections - and so on and so forth. It is even hard to author, since the user cannot type the letters/words that belong together in one chunck — the author has instead type one ruby base letter and then a ruby text letter/word etc, which is cumbersome and prone to error — it is comparable to a table model where the author has to work with cells on two rows simultaneously.

The content model of this change proposal, requires that the above example be written like this:

<ruby><rb>W</rb><rb>W</rb><rb>W</rb><rt>World</rt><rt>Wide</rt><rt>Web</rt></ruby>or this (NOTE: rtc is not made conforming, as of yet, due to legacy parser problems):

Rationale

The inclusion of rb addresses the problem that the alternative — exclusion of rb — encourages ad-hoc solutions with regard to marking up or styling the ruby base text.

The inclusion or rb allows thought-free, direct transition to/from e.g. XHTML 1.1. and HTML5. (Thought free = automatic: The author — or the authoring tool — does not have to ponder about which element to use.)

A dedicated wrapper element offers thought-free simplicity with regard to adding CSS, ARIA, a language tag or semantic meta data to the ruby base word — and without effects on <ruby>, <rt> or <rp>.

The change of the content model, addresses the need words/compounds to appear as words/compounds also in the source, in order to be compatible non-visual parsing (screen readers, find-in-page, translation services etc), which over all needs to work with text that is just as logical in the source as in the display.

The inclusion of rbc allows advanced ruby to be styled more simply.

Preparing the HTML5 parser to handle rtc, allows the rtc elemetn to be introduced in HTML6, and thereby allowing double sided ruby.

With the exepction of the ruby element itself, and the rtc element, let the parser auto-close the current element (be it rb, rt, span or whatever), when the parser sees a rp or a rt element. This is almost what Geck and Webkit currently do, with the exception that they also auto-closes rtc.

As authoring requirements:

Say that rb SHOULD be manually closed by the author, in order to accommodate legacy UAs (that do not auto-close it).

Impact

Positive Effects

Offers simple transition from existing (simple) ruby mark-up to HTML5. E.g. simple to define cross-language 'microformats' that includes ruby if HTML5 and the the other language includes the same elements. And simple to make tools - and build on existing tools - that work in HTML5 as well as XHTML1.1.

Most authors and authoring tools will continue to use rb - no need to learn something new.

Instead of forbidding rb, with difficult to explain reasons as justification and yet with effects on authors and authoring tools, allowing rb instead benefit those authors, those parsers and those authoring tools that already use/implement it, thereby avoiding to needlessly bother them.

Authors don't have to wait for new CSS features to be invented (and deployed) before they get a CSS selector that is dedicated to styling ruby base text — and only ruby base text!

Change of content model assures that text is meaningful both in source and in display, and thus simplifying treatment of ruby in AT, translation services, find-in-page features, spell-checkers etc

Simpler styling and more efficient meta data tagging or ruby base, due to inclusion of rbc (tag one element instead of all the rb elements)

Preparedness for the future, due to the change in the HTML5 parser so that it doesn't auto-close the rtc element.

Negative Effects

rb is not well supported in all legacy HTML parsers (the Trident parser), hence authors have to be aware of the need to use helping scripts (Modernizr, HTML5shiv etc) and helping CSS in order to get good styling.

Counter argument: For when this causes problems, authors have the option of either dropping the rb (if it helps) and/or adding an additional wrapper, such as span

Counter argument: If — as in legacy IE versions — the rb renders as an empty element, then no harm happens. Effectively, it means that the ruby base text is rendered without a wrapper - equivalent to dropping the rb.

Counter argument: Why would the lacking support in legacy HTML parsers be any more important to consider when it comes to rb compared to other, new HTML elements?

Counter argument: Since there actually is some support of rb in legacy UAs (at least, it is treated as span), it could be argued that it is simpler to include rb compared to the inclusion of many completely new HTML elements in HTML5.

IE since IE9 supports rb natively.

Due to little visual feedback when rb is supported in contrast to when it is not supported, authors may think that rb works, while it in reality doesn't.

Counter argument: This can be said about legacy parsers with regard to many new element. And since it hasn't been held against the any other, new element in HTML5, it seems unjustified to hold it against rb.

Some pages that are authored accorind to the current content model, will become invalid due to this CP's requrement that no more than a single adjacent pair of rb and rt occur in the same ruby element.

The benefits of fixing the page is more important thant this slight annoyance.

Change of content model affects how one browser (Webkit) with partial support for ruby display the ruby

This is true. However, the source order is so important that it is worth it.

Use of the rbc element affects how one browser (Webkit) with partial support for ruby display the ruby

Counter argument: This is not true. It is the change of the content model that can cause this. The inclusion rbc instead allows authors to style the ruby so that it looks fine also in webkit.

Counter argument: Actually, in Trident, the <rbc> does no harm.

Conformance Classes Changes

It becomes conforming to to use the rb element inside ruby

The HTML5 parser must auto-close the rb element — and any ather element except ruby and rtc — when it sees rt or rp

Until UAs offers broad support auto-closing, it becomes RECOMMENDED to manually close the rb element.

The rbc element becomes valid

Only a single adjacent pair of rb and rt is conforming

Risks

If the author fails to manually close the rb element, then legacy UAs may place the rest of the ruby content inside the rb element, causing the mark-up to malfunction in legacy parsers.

Counter argument: In existing usage, the rb seems to almost always be closed.

Counter argument: This is a temporal problem - UAs willl update. (Currently released versions of Gecko and Webkit plus IE10 do it.)

Authors who do close the rb element, might think that closing the rb will guarantee that it works in legacy UAs.

Counter argument: Why? It is already well known that legacy versions of e.g. Firefox and IE do not handle unknown elements well, and that one must just various tricks in order to make new HTML5 elements work in legacy UAs.

If authors are required to use span instead, then they know that span does not get auto-closed, whereas for rb, they have to learn that while it is intended to auto-close, it does so far not get auto-closed.

Counter argument: Actually, this is not true since, in Gecko, Webkit and IE10, then any element ges auto-clsoed when it see rp or rt

Counter argument: To the degree that it is true (see above), it is a temporal problem - UAs willl update. Also: Because pre-HTML5 ruby is part of XHTML 1.1, the big bulk of legacy code do close the rb element, so it does not seem much of problem. This CP does however suggest that authors SHOULD close the rb until UAs catch up.

To not make the rb element obligatory (like in XHTML 1.1), creates the risk that authors omits the rb element, just because they can, and because they think there is a benefit in doing so.

Comment: Agreed. I lean towards making the rb element obligatory, as I don't see it as particulary healthy that one can ommit the rb element. From my perspective, allowing the rb to be omitted, is just a compromise position (and I don't rule out that my argumentation in favour of includsion rb would have been more successful if I asked it to be obligatory). There is primarily only one benefit, namely, that it fits slightly better with the fact that IE6-8 does not by default recognize this element. But this does not seem to be much of an argument since, in a HTML5 parser, any unknown element would be handled in a defined way. Thus, while <rb>word<rb> might fail to work in an un-prepped copy of IE6/IE7/IE8 (and may be in Firefox 2), it would still nevertheless work in an HTML5 parser (as well as in e.g. IE6/7/8 browser prepped with an HTML5 helper script.

Comment: Indeed. There would indeed be no more benefit in omitting the rb element, than it would be for the same author in dropping the html, head or body element. E.g. if the author drops the html element, then he/she also drops adding a language tag, semantic meta data and so on for the entire document. Actually, when dropping html, then the element is still auto-generated by the HTML5 parser — which means that it is readily available for scripting and styling. In contrast, when dropping the rb element, then there is no automatic generation of the element. Which means that when an author omits of the rb element, then it he also takes away the direct opportunity to apply e.g. CSS to the element. (Fortunately, however, by including the rb in HTML5, the author (or the authoring tool) has a very simple recipe for fixing such situations, though.)

References

Relevant tests

rb versus span

for (more or less) HTML5-capable parsers (Firefox 9 and Opera 11.5, then use of spandoes not work any better than using rb: In either case, the author, if he/she wants to use ruby, must use alternative CSS styling, due to the lack of support for the Ruby CSS module.

for UAs that have some kind of built-in ruby support (IE, Safari/Chrome, ) then span and rb works equally well.

if one adds the HTML5 shiv — <script>document.createElement("rb")</script> to Richard's tests (see demo), then rb works as good/bad as span in legacy IE version.

auto-closing of the rb element

Parser does not recognize <rb> as an element (even if it recognizes <ruby> and <rt>): IE5-8.In IE5-8, this has two effects: a) rb styling does not work; b) it might make it seem as if auto-closing does work;.

Use cases for rb

Uses cases for rb amount to documenting that there are use cases for the addition of language tags, CSS, metadata (via Microformats, Microdata or RDFa), ARIA to ruby base text. In our view, it does not make sense to accept that it has to be documented — via use cases — that ruby base text needs langauge tagging, styling, metadata or ARIA, since these are features which are generally accepted as needed anywhere on any element. Nevertheless, we will mention some such examples:

Language tagging:

The very idea behind ruby markup is to express a translation (in the widest sense of the word) of a ruby base text in the form of a ruby (annotation) text. Often the difference between base and text is only a difference in script - thus the language is the same while the script differs. Other times, the language differs. In either case, the difference in script or language, can be expressed via language tags on either rb or rt — or on both. We can conclude that ruby mark-up (to the degree that language tagging has any relevance at all) is more frequently needed for ruby mark-up than for any other HTML construct.

The language is inherited from a parent element — e.g. from p or html. And since the ruby text (rt) is supposed to explain the ruby base (rb), one must conclude that it is the ruby base that most frequently will need to be language tagged. (The rt will just inherit the language tagging value from the parent element.) Without the rb element, one would have to first add a language tag on the ruby element, and then add a language tag on the rt element, in order to cancel the (inherited) effect ot setting the language on the ruby element — quite ad hoc and impractical, in our view.

ARIA, CSS

In the WebAIM forum, it was recently asked how to convey to an AT user that a Roman numeral (such as IV) stood for 4 and was not to be read as the letters I plus V. I provided an answer where one of the options was to use ruby markp. With the help of <rb aria-hidden="true">, this was the only method that worked even in the Mac OS X screen reader (VoiceOver). In the demo code, I also added styling of the rb element in the form of rb{text-decoration:underline}, to hint to sighted users as well that hovering above this text, would provide information. Thus I was able to, by default, hide the ruby text for everyone except screen readers uers — see demo.

CSS selectors for ruby base

Koji Ishii suggests treating rb just like tbody — thus, the selector should work, even if the author skips actually typing it. A good idea. But this proposal does however not make that proposal, as it would mean that one woudl have to change the HTML5 parser so that it autogenerates the element. No such change are on the horizon. Alternatively, one could add a pseudo selector in CSS - e.g. ruby:base{}. But no such selector is on the horizon either. It stands that it is necessary to be able to select the ruby base, and that simples way is to use rb.