Text Terminology

Before we investigate the primary method of adding text, the <text> element, we should define some terms you'll see if you read the SVG specification or if you work with text in any graphic environment:

Character

A character, as far as an XML document is concerned, is a byte or bytes with a numeric value according to the Unicode standard. For example, what we call the letter "g" is the character with Unicode value 103.

Glyph

A glyph is the visible representation of a character or characters. A single character can have many different glyphs to represent it. Figure 8-1 shows the word "glyphs" written with two different sets of glyphs — look particularly at the initial "g" — it's the same character, but the glyphs are markedly different.

Figure 8-1. Two sets of glyphs

Multiple characters can reduce to a single glyph; some fonts have separate glyphs for the letter combinations "fl" and "ff" to make their spacing look better (these are called ligatures). Other times, a single character can be composed of multiple glyphs; a print program might create the character é (which has Unicode value 233) by combining the "e" glyph with a non-spacing accent mark "´".

Font

A collection of glyphs representing a certain set of characters. All the glyphs in a font will normally have the following characteristics in common:

Baseline, ascent, and descent

All the glyphs in a font line up on the baseline. The distance from the baseline to the top of the character is the ascent; the distance from the baseline to the bottom of the character is the descent. The total height of the character is also called the em-height. The em-box is a square that has a width as large as an em-height.

The upper dotted line in Figure 8-2 is used to determine the cap-height, which is the height of a capital letter above the baseline. The lower dotted line is used to determine the ex-height, which, logically enough, is the distance from the baseline to the top of a lower case letter "x."

Figure 8-2. Glyph measurements

Simple Attributes and Properties of the text Element

The simplest form of the <text> element requires only two attributes, x and y, which define the point where the baseline of the first character of the element's content is placed. The default style for text, as with all objects, is to have a fill color of black and no outline. This, as it turns out, is precisely what you want for text. If you set the outline as well as the fill, the text looks uncomfortably thick. If you set only the outline, you can get a fairly pleasant set of outlined glyphs, especially if you lower the stroke width. Example 8-1 uses the placement and stroke/fill characteristics for <text>; the result is Figure 8-3.

Many of the other properties that apply to text are the same as they are in the Cascading Style Sheets standard. The following is a list of the CSS properties and values that are implemented in the Apache Batik viewer version 1.0:

font-family

The value is a whitespace-separated list of font family names or generic family names. The generic family names are serif, sans-serif, and monospace. Serif fonts have little "hooks" at the ends of the strokes; sans-serif fonts don't. In Figure 8-1, the word at the left is in a serif font and the word on the right is in a sans-serif font. Both serif and sans-serif fonts are proportional; the width of a capital M is not the same as the width of a capital I. A monospace font, which may or may not have serifs, is one where all the glyphs have the same width, like the letters of a typewriter.

font-size

The value is the baseline-to-baseline distance of glyphs if you were to have more than one line of text. (In SVG, you can't have multi-line <text> content, so the concept is somewhat abstract.) If you use units on this attribute, as in style="font-size: 18pt", the eighteen-point size will be converted to user units before being rendered, so it can be affected by transformations.

font-weight

The two most commonly used values of this property are bold and normal. You need the normal value in case you want to place non-bold text in a group that has been set to style="font-weight: bold".

font-style

The two most commonly used values of this property are italic and normal.

text-decoration

Possible values of this property are none, underline, overline, and line-through.

word-spacing

The value of this property is a length, either in explicit units such as pt or in user units. Make this a positive number to increase the space between words, set it to normal to keep normal space, or make it negative to tighten up the space between words. The length you specify is added to the normal spacing.

letter-spacing

The value of this property is a length, either in explicit units such as pt or in user units. Make this a positive number to increase the space between individual letters, set it to normal to keep normal space, or make it negative to tighten up the space between letters. The length you specify is added to the normal spacing.

Text Alignment

The <text> element lets you specify the starting point, but you don't know, a priori, its ending point. This would make it difficult to center or right-align text, were it not for the text-anchor property. You set it to a value of start, middle, or end. For fonts that are drawn left-to-right, these are equivalent to left, center, and right alignment. For fonts that are drawn in other directions (see Section 8.7) these have a different effect. Example 8-3 shows three text strings, all starting at an x-location of 100, but with differing values of text-anchor. A guide line is drawn to show the effect more clearly in the result, Figure 8-5.

The tspan element

Another consequence of not knowing a text string's length in advance is that it is difficult to construct a string with varying text attributes, such as this sentence, which switches among italic, normal, and bold text. If you had only the <text> element, you'd need to experiment to find where each differently styled segment of text ended in order to space them properly. To solve this problem, SVG provides the <tspan>, or text span element. Analogous to the XHTML <span> element, <tspan> is a tabula rasa that may be embedded in text content, and upon which you may impose style changes. The <tspan> remembers the text position, so you don't have to. Thus, Example 8-4, which produces the display in Figure 8-6.

In addition to changing presentation properties such as font size, color, weight, etc., you can also use attributes with <tspan> to change the positioning of individual letters or sets of letters. If, for example, you want superscripts or subscripts, you can use the dy attribute to offset characters within a span. The value you assign to this attribute is added to the vertical position of the characters, and continues to affect text even outside the span. Negative values are allowed. A similar attribute, dx, offsets characters horizontally. Example 8-5 uses vertical offsets to create the "falling letters" in Figure 8-7.

If you wish to express the offsets in absolute terms rather than relative terms, you use the x and y attributes. This is handy for doing multi-line runs of text. In fact, you must do it this way, since, as you will see in Section 8.9, SVG never displays newline characters in text. (The lack of a newline will be remedied in SVG 1.1.) If your SVG viewer allows text selection, putting multiple lines into a single <text> element, as we have done in Example 8-6 will allow the selection to include all the lines. You should always use <tspan>s within a <text> element to group related lines, not only to allow them to be selected as a unit, but also because it adds structure to your document.

Example 8-6. Use of absolute positioning with tspan

<text x="10" y="30" style="font-size:12pt;">
They dined on mince, and slices of quince,
<tspan x="20" y="50">Which they ate with a
runcible spoon;</tspan>
<tspan x="10" y="70">And hand in hand, on the edge
of the sand,</tspan>
<tspan x="20" y="90">They danced by the light of the moon.</tspan>
</text>

There's no visual evidence in Figure 8-8 that all the text is in one <text> element, but trust us — they're all connected.

Figure 8-8. Absolutely positioned poetry

You may also rotate a letter or series of letters within a <tspan> by using the rotate attribute, whose value is an angle in degrees.

If you have to modify the positions of several characters, you can do it easily by specifying a series of numbers for any of the x, y, dx, dy, and rotate attributes. The numbers you specified will be applied, one after another, to the characters within the <tspan>. This is shown in Example 8-7.

Although Figure 8-9 doesn't show it, the effects of dx and dy persist after the <tspan> ends. If more text were placed after the closing </tspan>, it would be at the same offsets as the letter n. It would not return to the baseline established by the first capital S.

Figure 8-9. Multiple horizontal and vertical offsets

Warning

If you have nested <tspan> elements, the x, y, dx, dy, and rotate attribute values are not inherited by the inner elements.

Although you can use the dy attribute to produce superscripts and subscripts, it's easier to use the baseline-shift style, as we have done in Example 8-8. This style property has values of super and sub. You may also specify a length, such as 0.5em, or a percentage, which is calculated in terms of the font size. baseline-shift's effects are restricted to the span in which it occurs.

In Figure 8-10, the subscripted numbers appear too large. In an ideal case we'd set the font-size as well, but we wanted this example to concentrate on only one concept.

Figure 8-10. Subscripts and superscripts

Setting textLength

Although we said that there's no a priori way to determine the endpoint of a segment of text, you can explicitly specify the length of text as the value of the textLength attribute. SVG will then fit the text into the given space. It does so by adjusting the space between glyphs and leaving the glyphs themselves untouched, or it can fit the words by adjusting both the spacing and glyph size. If you want to adjust space only, set the value of the lengthAdjust to spacing (this is the default). If you want SVG to fit the words into a given length by adjusting both spacing and glyph size, set lengthAdjust to spacingAndGlyphs. Example 8-9 uses these attributes to achieve the results of Figure 8-11.

Vertical Text

When you use SVG to create charts, graphs, or tables, you will often want labels running down the vertical axes. One way to achieve vertically-oriented text is to use a transformation to rotate the text 90 degrees. Another way to achieve the same effect is to change the value of the writing-mode style property to the value tb (meaning top to bottom).

Sometimes, though, you want the letters to appear in a vertical column with no rotation. Example 8-10 does this by setting the glyph-orientation-vertical property with a value of zero. (Its default value is 90, which is what rotates top-to-bottom text 90 degrees.) In Figure 8-12, this setting tends displays the inter-letter spacing as unnaturally large. Setting a small negative value for letter-spacing solves this problem.

Internationalization and Text

Unicode and Bidirectionality

XML is based on the Unicode standard (fully documented at the Unicode Consortium's web site, http://www.unicode.org). This lets text display in any language that the underlying viewer software can displaying, as you can see in Figure 8-13. Some languages such as Arabic and Hebrew are written right to left, so when text in these languages is mixed with text written left to right, as English is, the text is bidirectional, or bidi for short. The system software knows which characters go in which direction and works out their positions accordingly. Example 8-11 also overrides the implicit directionality of a segment of text by setting its direction style property to rtl, which stands for right-to-left. If you wish to change the direction of Hebrew or Arabic text, set it to ltr, which is left-to-right. You must also explicity override the underlying Unicode bidirectionality algorithm by setting the unicode-bidi style property to bidi-override.

The switch Element

The ability to display multiple languages in a single document is useful for such things as a brochure for an event that receives international visitors. Sometimes, though, you would like to create one document with content in two languages, say, Spanish and Russian. People viewing the document with Spanish system software would see the Spanish text, and Russians would see Russian text.

SVG provides this capability with the <switch> element. This element searches through all its children until it finds one whose systemLanguage attribute has a value that matches the language the user has chosen in the viewer software's preferences. The value of systemLanguage is a single value or comma-separated list of language names. A language name is either a two-letter language code, such as ru for Russian, or a language code followed by a country code, which specifies a sublanguage. For instance, fr-CA denotes Canadian French, while fr-CH denotes Swiss French.

Once a matching child element is found, all its children will be displayed. All the other children of the <switch> will be bypassed. Example 8-12 shows text in UK English, US English, Spanish, and Russian. Since a match of language code alone is considered a match, and country codes are used only to "break a tie," the text for UK English must come first.

Figure 8-14. Combined screenshots as seen with different language preferences

Using a Custom Font

Sometimes you need special symbols that are not represented in Unicode, or you want a subset of the Unicode characters without having to install an entire font. An example is Figure 8-15, which needs only a few of the over 2,000 Korean syllables. You can create a custom font as described in Appendix E and give its starting <font> tag a unique id. Here is the relevant portion of a file that contains six of the Korean syllables exported from the Batang TrueType font. The file is called kfont.svg:

Once that is done, Example 8-13 can reference the font in that external file. For the sake of consistency, the value of the font-family that you use in this SVG file should match the value in the external file.

Text on a Path

Text does not have to go in a straight horizontal or vertical line. It can follow any arbitrary path; simply enclose the text in a <textPath> element that uses an xlink:href attribute to refer to a previously defined <path> element. Letters will be rotated to stand "perpendicular" to the curve (that is, the letter's baseline will be tangent to the curve). Text along a gently curving and continuous path is easier to read than text that follows a sharply angled or discontinuous path.

Warning

The path you reference in the <textPath> element will not be displayed. That's why Example 8-14 has to draw the paths with <use> elements.

You may adjust the beginning point of the text along its path by setting the startOffset attribute to a percentage or to a length. For example, startOffset="25%" will start the text one-fourth of the distance along the path, and startOffset="30" will start the text at a distance of thirty user units from the beginning of the path. If you wish to center text on a path, as in Example 8-15, set textanchor="middle" on the <text> element and startOffset="50%" on the <textPath> element. Text that falls beyond the ends of the path will not be displayed, as shown in the left half of Figure 8-18.

Whitespace and Text

You may change the way that SVG handles whitespace (blanks, tabs, and newline characters) within text by changing the value of the xml:space attribute. If you specify a value of default (which, coincidentally, is the default value), SVG will handle whitespace as follows:

Remove all newline characters

Change all tabs to blanks

Remove all leading and trailing blanks

Change any run of intermediate blanks to a single blank

Thus, this string, where \t represents a tab and \n represents a newline, and an underscore represents a blank, this text:

\n\n___abc_\t\t_def_\n\n__ghi

will render as:

abc_def_ghi

The other setting of xml:space is preserve. With this setting, SVG will simply convert all newline and tab characters to blanks, and then display the result, including leading and trailing blanks. the same text:

\n\n___abc_\t\t_def_\n\n__ghi

then renders as:

_____abc____def_____ghi

Warning

SVG's handling of whitespace is not like that of HTML. SVG's default handling eliminates all newlines; HTML changes internal newlines to a space. SVG's preserve method converts newlines to blanks; HTML's <pre> element does not. There is no newline in SVG 1.0; this bothers people until they realize that SVG text is oriented towards graphic display, not textual content (as in XHTML).

Case Study -- Adding Text to a Graphic

Figure 8-19 adds Korean and English text to the Korean national symbol shown in Figure 6-5. The text is centered along an elliptical path. The additional SVG in Example 8-16 is shown in boldface.