Blog

Inside DITA: Line Breaks

One of the advantages of authoring content in DITA is the separation between content and presentation. This feature sometimes frustrates writers new to DITA who are used to adjusting layout and styling themselves. As my colleague Katriel Reichman is known to say, “Writing in DITA is like raising teenagers – you just got to learn how to let go!” We have to learn to let the style sheets handle all the presentation issues.

Occasionally there is need for authors to tweak presentation within the DITA source. One example is deciding where a line break should occur. Often this decision has to be made by a human being rather than by a computer.

Bad Line Breaks

Say I have the title: “Specifications, Features and Limitations of the DITA Accelerator by Suite Solutions.” I want the company name “Suite Solutions” to stay together on a line, and not break up as happened by default – see the screen shot below.

To solve this, I could configure the transform to always keep the words “Suite” and “Solutions” together on a line. But this would require the author to know in advance all the words that can or can’t break on a line. In addition, this would get quite complicated with localized content, or even multi-lingual content.

The PDF renderers have a built in algorithm for calculating line breaks. The renderer will first look for a space in the text, then a hyphen, and so on. But in our case, I don’t want the renderer to break at the space.

This is a case where the author needs to have control of line breaks from within the DITA source. This can be accomplished using XML entities created for this purpose.

Non-Breaking Space

The first XML entity we will introduce is the non-breaking space. When we replace the space between “Suite” and “Solutions” with a non-breaking space, the renderer will be prevented from breaking the line here. The XML entity is &#160; Many XML editors have built-in tools for inserting this character.

Zero Width Space

In other cases, rather than tell the renderer what words must be kept together, we may choose instead to tell it where it may break. Let’s say I have a long computer name or hostname included in my title, such as: “About the host c-61-123-45-67.hsd1.co.hostname.net”. By default the renderer will not know where it may break the hostname. The screenshot below shows the results with my current transform:

I probably want at least the “hostname” part of the text to stay together, but here a non-breaking space won’t help me as I don’t want any spaces in the hostname. This is where I might use a zero-width space. A zero-width space will add a space for the line-break algorithm, but won’t be visible at all in the presentation of the text. The XML entity for zero-width space is &#8203;

Here is the zero-width space applied in the DITA source:

Zero-Width No-Break Space

Another useful XML entity is the zero-width no-break space, also known as a word-joiner. This entity is similar to the non-breaking space in that it indicates that the text may not be broken at this point. The difference between the zero-with joiner and the non-breaking space is pretty obvious – the zero-width no-break space will not add any visible space in the output. The XML entity for the zero-width no-break space is &#8288;

What are other scenarios where your authoring team has needed to tweak presentation via the DITA source? Let us know in the comments below.