Geeky songs and stuff from Daniel Davis

Menu

Monthly Archives: April 2010

Opera colleague Bruce Lawson thought it might be spiffing if the description of the <ruby> element that appears in the HTML5 spec was clarified a bit, so here’s my attempt. I’m using Japanese as an example although it applies to Chinese and possibly other languages as well. Please note my definition of one syllable may differ from yours.

Step 1: The Japanese Writing System

In Japanese there are three alphabets—one semantic (thousands of characters) and two phonetic (roughly 50 each).

The semantic alphabet is called kanji, based on traditional Chinese characters, and each character has a meaning (although sometimes quite abstract). When you read words written with kanji you may understand their meaning but there are no clues as to their pronunciation.

日 = sun
本 = origin
日本 = land of the rising sun = Japan

The first phonetic alphabet is hiragana. This comes from highly stylised Chinese characters developed around the 5th century when only men were educated enough (or deemed intelligent enough) to use Chinese characters (kanji). Literate ladies of the ruling class used the easy-to-write hiragana, each representing one syllable and having no meaning, to write letters, poetry and novels (the original chick lit). When you read words written with hiragana you can pronounce them but you may not know their meaning.

に = ni
ほ = ho
ん = n
にほん = nihon = Japan

The second phonetic alphabet is katakana. This apparently was developed by monks also using Chinese characters as a basis for a highly simplified alphabet. Whereas hiragana characters are more rounded, katakana characters are more sharp and angular. They are used primarily for foreign words that don’t have a Japanese translation (e.g. “browser”) or for making a word stand out or appear modern. Like hiragana, when you read words written with katakana you can pronounce them but you may not know their meaning.

二 = ni
ホ = ho
ン = n
二ホン = nihon = Japan

Any piece of Japanese text (banner ad, article, legal doc, etc.) uses a combination of kanji, hiragana and katakana. It is sometimes the case that people reading the text can’t read the kanji, especially because kanji characters can have more than one pronunciation. People and place names are one example of kanji having numerous or irregular pronunciations.

日 = can be pronounced "nichi", "hi" or "ka"
本 = can be pronounced "hon" or "moto"
日本 = can be pronounced "nihon" or "nippon" = Japan

To help the reader, sometimes the pronunciation is written above the kanji using the hiragana alphabet. This is called furigana in Japanese and ruby in English (from the name of small type with a height of 5.5 points). It is often used in newspapers and books but not so much on websites, due to the difficulty of squeezing miniature text above larger text on a single line. The <ruby> element aims to solve this.

Step 3: Tell us how it works

According to the current HTML5 spec, the <ruby> element is an inline element and is placed around the word or character you’d like to clarify, like so:

<ruby>日本</ruby>

By itself this does nothing, so we add the pronunciation either for each character or, as in this case and my personal preference, for the word as a whole. For this, we use the <rt> tag, meaning ruby text.

<ruby>日本<rt>にほん</rt></ruby>

We could leave it like that and supporting browsers would show the hiragana pronunciation above the kanji text, but non-supporting browsers would ignore the tags and show both the text and its pronunciation side-by-side. To solve this, the masters of the HTML5 universe have given us another tag, <rp> meaning ruby parentheses, which cleverly hides characters (namely parentheses) in supporting browsers. This means we can write the pronunciation in parentheses which non-supporting browsers will show, and supporting browsers will continue to show the pronunciation without parentheses above the main text.

<ruby>日本<rp>(</rp><rt>にほん</rt><rp>)</rp></ruby>

Step 4: Say what?

Supporting browsers → ruby text is shown above main text

Non-supporting browsers → ruby text is shown next to main text but in parentheses.

Overall, the <ruby> element is a great help not just for learners of Japanese (or Chinese, etc.) but also for when uncommon characters are used in a piece of text. It could also be used in any language simply for clarifying a term or unfamiliar concept in an unobtrusive and accessible manner.

For reference:

Also, see this page for an example of how hiragana.jp uses the <ruby> element (and CSS for non-supporting browsers) to provide ruby text above all kanji words. Perfect for people learning to read Japanese. Note that they also use the <rb> tag to markup the kanji words but at the time of writing this is not officially part of HTML5.