This document is a Note made available by
the World Wide Web Consortium (W3C) for your information only.
Publication of this Note by W3C does not imply endorsement by W3C, including the Team and
Membership, nor that the W3C has, is, or will be allocating any resources to
the issues addressed by this Note.

This Note just describes the author's personal experiments.
Implementations found in this document are informative only, and
may contain errors.
This is a draft document and may or may not be updated, replaced, or
obsoleted by other documents at any time.
It is inappropriate to use this Note as reference material or to cite this
as other than "personal experiments".

While the author welcomes comments on this document, author does not
guarantee a reply or any further action. Comments on this document
should be sent to the author.

Ruby Annotation specification [Ruby]
defines an abstract definition of ruby annotation markup, and provides a Ruby
DTD Module for
XHTML
which is used in
XHTML 1.1
[XHTML11] as a module implementation.
But DTD is just
an implementation method, and other implementations are also possible.

In this Note, module implementations of the Ruby abstract definition
in several schemas are explored —
DTD,
RELAX
[RELAX],
TREX
[TREX] and
the XML Schema
[XMLSchema].

The following is the abstract definition of ruby markup,
found in section 2.1
of Ruby Annotation [Ruby].
This abstract definition is consistent with
the XHTML
Modularization framework [XHTMLMOD].
Further definitions of
XHTML
abstract modules can be found in Modularization of
XHTML
[XHTMLMOD].

Unlike its structural complexity, integrating this Ruby
DTD Module into
markup languages is fairly easy.
There are four important parameter entities in this Ruby Module
that markup language designers should care, which are defined in
the Ruby Module as follows:

This paremeter entity provides an option whether to include
complex ruby markup or not.
This is set to 'INCLUDE' by default.
By setting it to 'IGNORE',
markup language designers can exclude the complex ruby markup.
In this case, the rbspan attribute of the rt
element is not included, as it's useless for simple ruby.

%NoRuby.content;

This paremeter entity provides an easy way to redefine the
content models of the rb and the rt elements.
It's defined as '( #PCDATA )' by default.
Basically, these elements will have the same content model,
which is intended to allow other inline-level elements of its host
markup language, but it should not include ruby descendent elements.
By using this paremeter entity, markup language designers can redefine
those content models at once, without redefining them separately.

%Ruby.common.attlists;

This paremeter entity provides an easy way to define
common attributes for ruby elements.
This is set to 'IGNORE' by default.
Basically, ruby elements are intended to have common attributes of
its host markup language.
By setting this paremeter entity to 'INCLUDE',
markup language designers can define common attributes for
ruby elements at once using the parameter entity
%Ruby.common.attrib;, without defining them separately.

%Ruby.common.attrib;

This paremeter entity defines common attributes for ruby elements,
which is empty by default. In the case of
XHTML 1.1
[XHTML11], it is defined as follows:

<!ENTITY % Ruby.common.attrib "%Common.attrib;" >

%Ruby.fallback;

This optional paremeter entity provides an option whether to include
rp fallback mechanism or not.
This is set to 'INCLUDE' by default.
By setting it to 'IGNORE' (e.g. in internal subset),
authors can exclude this fallback mechanism.

Note that setting it to 'IGNORE' further restricts
the minimal content model of the ruby element, thus makes
that usage nonconformant to [Ruby], though,
the resultant document still conforms to [Ruby].
This option is provided only for authors' convenience should they really
wish to exclude rp fallback from their contents.
Markup language designers should not alter this paremeter entity to
define their markup languages.

%Ruby.fallback.mandatory;

This optional paremeter entity provides an option whether to enforce
rp fallback mechanism or not.
This is set to 'IGNORE' by default.
By setting it to 'INCLUDE' (e.g. in internal subset),
authors can enforce this fallback mechanism.

This option is provided only for authors' convenience should they really
wish to enforce rp fallback in their contents.
Markup language designers should not alter this paremeter entity to
define their markup languages.

In addition to redefining the above parameter entities (if necessary),
markup language designers have to add the ruby element to
some elements' content models where inline-level elements may appear.

In any case, markup language designers would have to redefine
%NoRuby.content;, %Ruby.common.attlists;
and %Ruby.common.attrib; as appropriate, when
integrating the Ruby Module into their markup languages.
There's no magic here.
But %Ruby.complex; (and %Ruby.fallback;)
can be used to choose the preferred structure of ruby markup.
The following examples illustrate how markup language designers
and authors can choose the preferred structure by just modifying
%Ruby.complex; (and %Ruby.fallback;).

Case 1: using the full ruby markup

When markup language designers want to use the full ruby markup,
namely, with support for both simple ruby and complex ruby,
they don't have to redefine %Ruby.complex;.

In this case, the structure of ruby markup is the same as the one used in
XHTML 1.1a
[XHTML11].

Case 2: using the simple ruby markup

When markup language designers want to use the the simple ruby markup,
namely, without support for complex ruby, they can redefine
%Ruby.complex; as follows:

<!ENTITY % Ruby.complex "IGNORE" >

In this case, the structure of ruby markup is the same as the one
currently supported in Microsoft Internet Explorer 5.0 and later.

Case 3: using the full ruby markup, without fallback

Note. Though the resultant document still conforms to
[Ruby],
using this structure for defining a markup language
is nonconforming to [Ruby].

When authors want to use the full ruby markup, namely with support
for both simple ruby and complex ruby but don't want to use
rp fallback mechanism, they can redefine
%Ruby.fallback; (e.g. in internal subset) as follows:

<!ENTITY % Ruby.fallback "IGNORE" >

In this case, the structure of ruby markup is the same as the one
developed for JIS
X 4052:2000 "Exchange format for Japanese documents with
composition markup" [JISX4052].

Case 4: using the simple ruby markup, without fallback

Note. Though the resultant document still conforms to
[Ruby],
using this structure for defining a markup language
is nonconforming to [Ruby].

When authors want to use the the simple ruby markup, namely without
support for complex ruby andrp fallback mechanism,
they can redefine %Ruby.complex; and %Ruby.fallback;
(e.g. in internal subset) as follows:

As shown above, the Ruby
DTD Module can be
readily integrated into [XHTML11].
It's natual that it works, as the Ruby Module is primarily designed
for that purpose.

But it also works with other
DTDs as well,
so long as those are properly parameterized. The following is an example
DTD driver to
integrate the Ruby Module into
the XML Specification
DTD
[XMLspec].
Thanks to its fine parameterization,
the XML Specification
DTD can easily
integrate the Ruby Module with minimum effort.
Of course, by controlling %Ruby.complex;, markup language
designers can choose preferred structure.

Note. At the time of writing, the latest version of
[XMLspec] is Version 2.1, but here Version 2.0
is used because a rewrite in
RELAX
is based on Version 2.0, so it is more consistent to use Version 2.0 for
comparison purpose.

SVG
(Scalable Vector Graphics) [SVG] is a language
for describing two-dimensional vector and mixed vector/raster graphics
in XML,
with a defined namespace. The Ruby Module is part of the
XHTML
namespace, so adding ruby annotation markup to
SVG
concerns mixing multiple markup vocabularies from different namespaces.

Modularization of XHTML [XHTMLMOD] provides a way
to deal with namespaces, using the
XHTMLQname (Qualified Name) Module.
The following DTD
driver defines a hybrid document type which allows ruby annotation markup
inside the desc, title, text,
tspan, textPath and a elements
in SVG, in
a namespace-aware way.
See [XHTMLMOD] for more details.

Note. In the above example, the id and xml:lang
attributes are defined for ruby elements.
%NoRuby.content; is NOT
redefined so the content model of the rb and the rt
elements is '( #PCDATA )'.

By using this DTD
driver, by default, ruby elements can be used like this:

<ruby xmlns="http://www.w3.org/1999/xhtml">...</ruby>

If prefixing ruby elements is preferred, one can set
%NS.prefixed; parameter entity to "INCLUDE",
and can define arbitrary namespace prefix with %XHTML.prefix;
parameter entity inside the internal subset of an instance.
An example instance would be something
like this:

RELAX
Core [RELAX] allows defining
RELAX grammars
as XML instances,
and provides the same datatypes as
the XML Schema Part 2
[XMLSchema] (with a few extensions —
none and emptyString).
In addition, it allows to describe constraints not expressible by
other schemas like
DTD and
XML Schema,
e.g. the rbspan attribute of the rt element
should not be allowed in simple ruby markup.
Each RELAX
Core module defines a single namespace grammar.

RELAX
Namespace [RELAX-NS] allows mixing grammars
from different namespaces. RELAX Namespace
divides mixed-namespace instance into "islands", where each island
consists of a single namespace, and each island can be validated by
the RELAX
Core processor. A module defining a single namespace may use schema languages
other than RELAX Core, such as
TREX
[TREX].

In section 4.1.1, an implementation of the Ruby
Module in RELAX
is defined. This module defines all the necessary grammars for the Ruby
Module, and allows simple ruby markup by default.
In section 4.1.2,
another RELAX
module that allows full ruby markup is defined, by including and extending
the Ruby RELAX
Module defined in section 4.1.1.

In RELAX,
multiple hedgeRules may share the same label.
A hedgeRef element referencing to some labels
will be expanded by the following procedure:

Locate all hedgeRules for this label.

Group hedge models of these hedgeRules with
a choice element.

Copy the occurs attribute of the hedgeRef
to this choice element.

Replace the hedgeRef with this choice element.

This feature may be used to extend the content model.
An example of using this feature is the Ruby RELAX Module shown in
section 4.1.2, which extends the hedgeRule
"Ruby.content" to allow complex ruby markup.
Namely, the following hedgeRule:

Similarly, the following hedgeRule is used to
define the content models of the rb and the rt
elements.

<hedgeRule label="NoRuby.content">
<empty/>
</hedgeRule>

This hedgeRule is referred inside elementRules
for rb and rt as:

<mixed>
<hedgeRef label="NoRuby.content" occurs="*"/>
</mixed>

so by default the content models of the rb and
the rt elements are effectively the same as:

<mixed>
<empty/>
</mixed>

which approximates '( #PCDATA )' in
DTD.
When integrating the Ruby RELAX Module,
another hedgeRule(s) which share the same label may be
defined to allow inline-level elements inside the rb and
the rt elements. For example, if em and
strong are allowed as inline-level elements,
the following hedgeRule could be defined:

Note. RELAX
has ability to disallow the ruby element to appear as a direct
or indirect subordinate of the rb or the rt elements.
However, while it's possible, writing a RELAX grammar to
prohibit indirect nesting of ruby is not quite easy.

In order to integrate the Ruby RELAX Module,
an attPool "Ruby.common.attrib" MUST also be
defined. This attPool will define common attributes for
ruby-related elements. For example, if common attributes are defined in
an attPool "Common.attrib",
"Ruby.common.attrib" could be defined like this:

Note that strictly speaking, [XMLspec] and
[Ruby] are in different namespace
([XMLspec] doesn't belong to any namespace,
while [Ruby] belongs to the XHTML namespace),
so the above example will not work. The above example is only to illustrate
the basic idea, in comparison with the DTD-based integration.

To mix vocabularies from different namespaces, RELAX Namespace
[RELAX-NS] can be used.
In order to allow the ruby element in
[XMLspec] using RELAX Namespace,
the namespace attribute can be aded to the ref
element:

Like RELAX,
TREX
[TREX] allows defining TREX patterns as
XML instances,
and a TREX
pattern specifies a pattern for the structure and content of an
XML document.
Among other things, TREX makes it easiter to describe exclusion-like constraint,
such as disallowing the nesting of the ruby element.

TREX
does not have built-in datatypes, and may be used with the datatyping
vocabularies such as XML
Schema Part 2 [XMLSchema].
Note that the following example uses [XMLSchema]
for datatyping, but TREX implementations may differ in the datatyping vocabularies
they support, e.g. an implementation might support older version of
[XMLSchema] identified by the namespace
URIhttp://www.w3.org/2000/10/XMLSchema, or might support another
datatyping vocabulary, or might not support datatyping at all.

In section 5.1.1, an implementation of the Ruby
TREX Module is
defined. This module defines all the necessary patterns for the Ruby Module,
and allows simple ruby markup by default.
In section 5.1.2, another Ruby TREX Module that allows
full ruby markup is defined, by including and extending the Ruby TREX Module defined in
section 5.1.1.

Note that the following modules include annotations using facilities
from XML Schema, but
elements and attributes from other than the TREX namespace will
be ignored, so you can use arbitrary elements and attributes from
a separate namespace for annotations.

The modules defined in section 5.1.1 and
section 5.1.2 are primarily designed
to be used with other XHTML modules,
to build XHTML
Family document types. Unlike
DTD-based modularization,
the modules take care of redefining the content models appropriately, so
you don't have to define the content model for a collection of modules.
Such an example is shown in section 5.2.2.

When you integrate the Ruby Module into other markup languages,
the patterns "Core.attrib", "Inline.model" and
"Inline.class" have to be defined in other module(s).
"Core.attrib" would include patterns of commons attributes
that may be used on ruby-related elements. Such attributes may be taken
from a module implementing the XHTML Attribute
Collections.

"Inline.model" would be defined like the following, where
"Inline.class" defines a list of inline-level elements.

This corresponds to '( PCDATA | Inline )*' in
the abstract definition. The "Ruby.concur" pattern
defines a pattern to exclude a ruby at any depth,
so the following pattern "NoRuby.content" corresponds
to '( PCDATA | Inline -ruby )*' in the abstract definition.

Note that in TREX, order of including modules is not irrelevant.
For example, if two duplicated definitions come from different modules,
the latter definition can be combined with the former, and if the latter
specifies the combine attribute with the value
replace, then the former definition will be replaced by
the latter. This is different from DTD, where the former wins,
or RELAX,
where order of inclusion is irrelevant.

The following is an example of adding simple ruby markup into
TREX pattern for XHTML Basic
[XHTMLBasic], written by
James Clark.
If you would like to use full ruby markup, just replace
xhtml-ruby-1.trex with xhtml-full-ruby-1.trex.

Note. This experiment is based on an XML Schema module implementation of
Ruby in "Modularization of XHTML in
XML Schema"
[XHTMLMODSchema],
although the experiment itself predated the publication of
[XHTMLMODSchema].
Sample XML Schema module
implementations shown below are NOT intended to supersede that one,
rather to explore how simple and complex ruby module implementations
should be, so that it will be properly modularized in the future.
It is expected that future draft of "Modularization of XHTML in
XML Schema" will
address this issue.

Although the primary purpose of this Note is to explore module
implementaitions of ruby, it is of course preferable that instances
that use ruby could be rendered in some way. This section briefly
explores how ruby could be rendered using style sheets.

Note. The above example XHTML 1.1 document
is served as "text/xml", and rendering is controlled by
CSS. Some
XML user agents might
not be able to render this document.

Note that as shown in Figure 1, the above style sheets render
ruby text as inline annotation, rather than interlinear annotation
(i.e. ruby annotation).
Formatting properties for styling ruby are under development,
see "CSS3
Module: Ruby" [CSS3-Ruby] for more details.
Ruby may also be used in vertical layout, see
"CSS3 module:
text" [CSS3-Text] (work in progress)
for relevant properties.

Although the rendering of ruby should be controlled by style sheets,
when appropriate style information is not provided, user agents could
still try to render ruby in some meaningful way.
The following examples illustrate how a user agent could support ruby.

W3C's
Amaya browser/editor implements
ruby as part of XHTML 1.1 support.
Figure 2 is an example rendering of
sample XHTML 1.1 document
in its main view, and Figure 3 is another
example rendering of the same document in alternate view.
It is of course possible to apply style properties like
"color" to ruby-related elements.

Figure 2: Example ruby rendering in Amaya
(main view)

Figure 3: Example ruby rendering in Amaya
(alternate view)

Note. At the time of this publication, ruby support is only available
from the CVS base.
It will be included in the next release.

JISTR X 0029:2000,
Japanese Standards Association
"RELAX Core"
was submitted to ISO as
"ISO/IECDTR 22250-1:2000, Document
Description and Processing Languages -- Regular Language Description for
XML (RELAX) -- Part 1:
RELAX
Core" in October 2000, and has been approved as an ISO/IECTR in May 2001 (to be published).
Information about
RELAX
can be available at:
http://www.xml.gr.jp/relax/