listext: A text file listing utility

5.3. XSL-FO design: Line formatting

This project turned out to be a lot more work than the author
anticipated, even given the challenges of dealing with the
enormous feature set of XSL-FO. The author cannot say for
sure whether the increased effort was due to problems with the
local FO-to-PDF toolchain or his imperfect understanding of
the XSL-FO, but two major features that were supposed to be
automatic had to be handled by custom code.

Originally the wrapping of long lines was going to be
handled automatically using a “hanging
indentation”, implemented as a block
element with a positive start-indent and a
negative text-indent. The author expected
that a gray color in an enclosing block-container would be visible only in the
indented area.

However, this ran afoul of the requirement that overflow
lines be preceded by a gray indentation. The
“outdented” portion of the first line also
displayed over the gray color. This is not an acceptable
rendering:

The --break option seemed to the author to
translate directly to XSL-FO's keep-with-previous property groups. However, in
practice, the toolchain did not express this property in a
predictable way.

Consequently, this program handles the breaking of overly long
lines, and the implementation of the --break
option, entirely with custom logic.

This custom logic requires that we compute the exact height
and width of each column so we can cut the input lines into
pieces that fit. In order to solve this problem with full
generality, we would need to know what font will be used,
and read the font metrics files for that font. That is
a nontrivial problem.

However, so long as this application is used only at the TCC,
where the FO processor (xep) uses
only a single monospaced font (Vera Sans Mono), we can assume
the metrics of that font to predict font width.
Experimentation shows that this font has a consistent 5:3
ratio of character height to width.

To force verbatim treatment of spaces, the block will need two attributes: white-space-treatment='preserve' and white-space-collapse='false'.

There is one other subtle problem: what to do about
unprintable characters, those not in the range from ASCII
SP (space) to “~”. Some of them will
affect the display:

Tabs (ASCII HT) will be expanded according to the
interval specified on the command line, or its
default interval.

If a form feed (ASCII FF) occurs at the beginning of
a line, and if it is the effective break string, it
will be removed before the line is displayed. In any
other case it will be treated as unprintable.

Carriage return (ASCII CR) may appear as the line
terminator, especially if the file came from the
Microsoft world. We will ignore them, and they
will not be displayed.

Linefeed (ASCII LF) is the line terminator character
and will not be displayed.

For other unprintables, the report will display the
character's hexadecimal code using two tiny characters in
the normal character space. In practice, the tiny font
will be half the current font size (rounded down, so in a
9-point font the tiny font will be 4-point).

Two other options affect the layout of the report body:

The --leading value is implemented as a
space-after property on each line block.

The --break option is implemented by
attaching a keep-with-previous property
to each line except lines that
start with the break string.