Formatting HTML

AH Formatter V6.4 can format HTML designed for the Web (except for HTML that uses a frame). However, there may be few HTML documents that achieve a good result without needing adjustment after formatting. The reasons are as follows:

The HTML document was designed especially for the browser and paginated media was not taken into consideration.

The HTML document does not follow the HTML specification.

The CSS used in the HTML document may not be used exactly according to the CSS specification.

For example, if the HTML can be printed from a Web browser without overflowing the right-hand side of the page, then formatting with AH Formatter V6.4 will produce a reasonable result. However, in order to achieve a better result, the HTML must be designed both for the browser and for printing. The CSS for printing may be precisely defined using rules such as:

@media print { ... }
@page { ... }

Moreover, there are big differences in the CSS implementations of current Web browsers. If the HTML contains grammar mistakes by being designed for a particular browser, or the HTML uses incorrect CSS, it is unlikely that a good result could be obtained.

Many (X)HTML documents on the Web use only generic fonts. (This is desirable considering the characteristics of the Web). Since the font settings for every script in the Option Setting File always apply in AH Formatter V6.4 GUI on Windows, suitable fonts will be used. However, this applies only to AH Formatter V6.4 GUI and only on Windows. When using the Command-line Interface, please set appropriate <script-font> values in the Option Setting File and specify the Option Setting File.

CAUTION:

Since AH Formatter V6.4 formats the document for print media, @media screen
is ignored even when viewing documents on-screen with the GUI.

CAUTION:

HTML saved from the web browserMany web browsers have a feature to save their current (X)HTML document. However, XHTML saved this way may not produce correct XHTML. When this XHTML is formatted by AH Formatter V6.4, it will produce an error and formatting will fail. If this happens, please specify HTML as the formatter type.
In addition, there may be cases where unexpected white space is added within Japanese text in the saved (X)HTML.
The added white space will reduce the quality of the formatted text.

This can be specified by <usercss>, in the Option Setting File and by the command line of -css or -s. (As for the .NET, Java interface, etc, they are equivalent to the corresponding command line). These are applied in the following order.

Applies CSS specified by the Option Setting File and -css in the appearance order.

This can be specified by <link> or <style> inside HTML, by the processing instruction of <?xml-stylesheet .. ?>. These are applied in the following order.

Applies the processing instruction in XML in the appearance order. (XML or XHTML)

Applies <link> or <style> inside HTML in the appearance order. (XML or XHTML)

Default CSS for HTML

Default CSS for HTML is used as the first stylesheet (user agent declarations) when formatting (X)HTML. This is html.css which is placed in the directory indicated by the environment variable, AHF64_DEFAULT_HTML_CSS. (When html.css does not exist, it is formatted as all the elements are inline).

This stylesheet is created based on the display of a web browser, the style specified by CSS, etc. However, there may be specification which cannot be well displayed depending on the environment. Probably, there is also a difference of taste. Users are required to optimize the default CSS according to their own environment etc. Some examples are shown below.

<q>

It is specified as follows by default CSS.

q:before { content: '\201C' }
q:after { content: '\201D' }

The current AH Formatter V6.4 cannot change the quotation marks depending on the language. The following specification may be preferable.

q:before { content: '\22' }
q:after { content: '\22' }

footnote

A footnote number is specified to be placed in the margin of the left page. If you don't want to make it overflow into the margin, please specify padding-left or specify list-style-position:inside to @footnote. decimal is specified for numbering. Although it is written that super-decimal is used in CSS3 GCPM, since there are many fonts without super-decimal, it is not adopted with default CSS. Probably, it is good to correct as follows when you want to use super-decimal.

Detection of Formatting Type

When the formatting starts by setting the detection of formatting type automatically, the formatting type will be determined in the following procedures.

When MIME is specified, AH Formatter V6.4 will follow its settings. That is, if text/html is specified, it will be detected as HTML. When application/xhtml+xml is specified, it will be detected as XHTML.

When auto-formatter-type="html" is specified in the Option Setting File and the extension of the input document is known, AH Formatter V6.4 will follow its setting. That is, when the extension is for HTML such as .htm or .html, it will be detected as HTML. If the extension is for XHTML, such as .xht or .xhtml, it will be detected as XHTML.

When there is no XML declaration and DOCTYPE is for HTML, it will be detected as HTML.

When auto-formatter-type="xhtml" is specified in the Option Setting File and the name space is for XHTML, it will be detected as XHTML.

When there is no XML declaration and name space does not exist and the root element is <HTML> with case insensitive, it will be detected as HTML.

When CSS which is not XSLT is specified (to the internal or external document), it will be detected as XML+CSS.

When the name space is for XSL-FO, it will be detected as XSL-FO.

Other than these will be detected as XML+CSS.

Although the document does not need to be XML if it's HTML formatting, it is required except HTML that the document should be well formed XML.

Difference in Formatting with AH Formatter V6.3

There are some differences in formatting between AH Formatter V6.4 and AH Formatter V6.3 as listed below.

MathML

In the MathML settings in the Option Setting File, the default value for the alignment of subscript/superscript has somewhat been modified. In order to make it the same setting as AH Formatter V6.3, specify mathmlSettingsMode="6.3".

Difference in Formatting with AH Formatter V6.2

There are some differences in formatting between AH Formatter V6.3 and AH Formatter V6.2 as listed below.

keep-footnote-anchor

With AH Formatter V6.2, the block containing the anchor was sent to the following page due to conditions such as orphans, and as a result, the footnote itself was sometimes arranged in the previous page. On the other hand, AH Formatter V6.3 will try to fit the dividable block after the anchor in the previous page. You can also get the same result by specifying axf:footnote-keep="always" in the original block.
In order to get the same result as AH Formatter V6.2 or earlier, please specify keep-footnote-anchor="false" in the Option Setting File.

list-style-type

The implemented list-style-type has been changed to use Predefined Counter Styles. The following style names were included in the previous list-style-type but are not included in the Predefined Counter Styles:

ethiopic-numeric

cjk-ideographic

japanese-formal-obsolete

urdu

lower-latin

upper-latin

upper-greek

lower-norwegian

upper-norwegian

hangul

hangul-consonant

fullwidth-lower-latin

fullwidth-upper-latin

halfwidth-katakana

halfwidth-katakana-iroha

Even though the style names are same as those given in the previous list-style-type, some of implementation of the Predefined Counter Style may differ. If you still want to use the style names of the previous list-style-type, please specify axf:number-transform="'lower-roman'" for instance, not axf:number-transform="lower-roman".

Difference in Formatting with AH Formatter V6.1

There are some differences in formatting between AH Formatter V6.2 and AH Formatter V6.1 as listed below.

latin-ligature / pair-kerning

The default values of latin-ligature and pair-kerning in the Option Setting File have been changed. Up to AH Formatter V6.1, these values are false. In AH Formatter V6.2, they are changed to true. This intends to be able to get a better formatting result by default. When axf:ligature-mode and axf:kerning-mode are specified specifically about those in FO, they don't influence the formatting result. These settings will influence the formatting speed.

Splitting blocks

In CSS, when the block with auto-height breaks at the end of a page for example, the block height was the break point as is up to AH Formatter V6.1. In AH Formatter V6.2, the height is adjusted to the end of a page.
The difference is remarkable when the background or the border is specified to the block. The same is applied to the end of column. ☞
5.3. Splitting BoxesThis behavior is not applied to FO.

Text wrapping with before float

When the float width on the before side fills up the region and there is no room for wrapping text, although the text is positioned aside by the float, the block itself has overlapped with the float. This can be checked by adding a background or a border to the block. When intrusion-displace="block" is specified, the block itself is positioned aside by the float. In AH Formatter V6.2, regardless of the setting of intrusion-displace, the block itself is positioned aside by the float.

Splitting footnotes

Up to AH Formatter V6.1, a page (column) break did not occur within footnote-body. In AH Formatter V6.2, it is possible to break pages (columns) within footnote-body. A footnote breaks by the setting of axf:footnote-max-height and it occurs by default. For this reason, the formatted result may differ from AH Formatter V6.1. In order to avoid the automatic break, please specify auto-break-footnote="false" in the Option Setting File.

BIDI

Up to AH Formatter V6.1, there was a known issue in the BIDI processing. With
AH Formatter V6.2, BIDI processing was corrected. Therefore, the formatted result may differ from V6.1.

Difference in Formatting with AH Formatter V6.0

There are some differences in formatting between AH Formatter V6.1 and AH Formatter V6.0 as listed below.

normalize

In AH Formatter V6.1, Unicode normalization (UAX#15: Unicode Normalization Forms) can be performed to the inputted text. See also axf:normalize.
The normalization may somehow influence the formatting speed. If you don't want to perform the normalization by default, please specify normalize="none" in the Option Setting File.

font-stretch-mode

In AH Formatter V6.1, when specifying a family name to the font-family, it's made available to choose a condensed font if it actually exists using the information of font-stretch="condensed" etc. Specify font-stretch-mode="6" in the Option Setting File. The operating differences between font-stretch-mode="5" and "6" are as follows.

font-stretch-mode="5"

The behavior is the same as AH Formatter V5. The information on font-stretch is not used for the font selection. That is, even if a condensed font exists in the family, it is not chosen. In order to choose a condensed font, it is necessary to specify the font name. When fonts called Foo-Regular.otf and Foo-Condensed.otf exist with the family name of Foo, Foo-Condensed.otf is not chosen even if <fo:block font-family="Foo" font-stretch="condensed"> is specified. It is necessary to specify <fo:block font-family="Foo-Condended">.

When <fo:block font-family="Foo" font-stretch="condensed"> is specified, Foo-Regular.otf is compressed and displayed. The compression ratio at that time is somewhat smaller (larger when expanding) than the value defined in the OpenType specification.

font-stretch-mode="6"

The information on font-stretch is used for the font selection. In the example above, <fo:block font-family="Foo" font-stretch="condensed"> is specified, Foo-Condensed.otf will be chosen. When a numerical value is specified as font-stretch, a condensed font is not searched. <fo:block font-family="Foo" font-stretch="extra-condensed"> is specified, and when there is no extra-condensed font, a condensed font is not necessarily compressed but the regular font will be compressed.

A compression ratio in case there is no condensed font will be the following values shown in the specification of OpenType.

ultra-condensed

50%

extra-condensed

62.5%

condensed

75%

semi-condensed

87.5%

normal

100%

semi-expanded

112.5%

expanded

125%

extra-expanded

150%

ultra-expanded

200%

baseline-mode

Although the position of the baseline was improved by AH Formatter V5, when the character (alphanumeric character) of European languages was rendered upright in vertical writing mode, there still remains the problem that the center position was not aligned. The problem has been improved by AH Formatter V6.1. Please specify baseline-mode="5" in the Option Setting File when you want to make it the same as V5.

viewport-length-units-mode

The interpretation of the vw and vh units have been changed. Formerly the unit was based on the entire page size including page margins. In AH Formatter V6.1, it is based on the size excluding the page margins. In addition, the pvw and pvh units based on the entire page size have been added. Please specify viewport-length-units-mode="5" in the Option Setting File when you want to make it the same as V5. In this case, the units behave as vw=pvw, vh=pvh, vmin=pvmin and vmax=pvmax.

Difference in Formatting with AH Formatter V5

There are some differences in formatting between AH Formatter V6.0 and AH Formatter V5 as listed below.

span

In AH Formatter V6.0, the behavior of span="all" differs from that in AH Formatter V5.

In AH Formatter V5, the span specified inside of the nested FO, that generates reference area such as fo:block-container is also effective. However, in AH Formatter V6.0, the span specified in FO nested inside of FO that generates reference area is invalid. For instance,

In V5, span="all" was effective with <fo:block>ABC</fo:block>. However it's invalid in AH Formatter V6.0. In addition, when "span="all" is specified to fo:block in the column of fo:block-container, that uses axf:column-count, it is considered that the span is specified to the column of the block-container.
In order to keep the same result as V5, please specify span="all" to the parent's fo:block-contianer.

Although the specification of the forced page break between the empty block at the beginning of the document and the block with span="all" was disregarded in V5, In AH Formatter V6.0, a forced page break is effective and a blank page is produced. In order to keep the same result as V5, please specify as follows:

Do not place an empty block the block with break-before="page" specified, or

Do not specify break-before="page" (as it is the beginning of fo:flow, it's not necessary). Or specify it to an empty block.

In case of one-column format, span="all" was not effective in V5. AH Formatter V6.0, even if it's one-column format, a reference area is generated. This causes the following differences, for example:

In case of one-column format, the space was generated between AAA and BBB in V5, but it's not generated in AH Formatter V6.0. It is because a reference area is generated by span even in one-column format, then the space without the specification of space-before.conditionality="retain" will be deleted at the beginning of the reference area. In order to keep the same result as V5, please do not specify span="all" in one-column format.

text-underline-mode

In AH Formatter V5, there were the following problems with the position of underline, overline and line-through.

The behavior of the 'auto' value of the width of vertical-text block within horizontal-text flow (or the height of horizontal-text block within vertical-text flow) is changed with AH Formatter V6.0.

In AH Formatter V5, the width of vertical-text block was given by the width of the outer area. In AH Formatter V6.0, the 'auto' width of vertical-text block shrinks to fit the content. If you don't want this behavior you should specify the width explicitly such as width="100%". Also the same behavior will be applied to the height of horizontal-text block within vertical-text flow.

There is an ambiguous portion of the specification in the operation of ZERO WIDTH SPACE (U+200B).
In AH Formatter V5, ZERO WIDTH SPACE is also a target for text-align="justify" and this portion becomes larger than others.
In addition, since leading and trailing ZERO WIDTH SPACE in the block are not exceptions, they spread also.
AH Formatter V6.0 can format as follows:

Remove ZERO WIDTH SPACE from the target of justify.

Delete leading and trailing ZERO WIDTH SPACE of a block.

This will avoid the effect of having a one-line space in the block such like <fo:block>&#x200B;</fo:block>. Please specify zwsp-mode in the Option Setting File.

Difference in Formatting with XSL Formatter V4

There are some differences in formatting between AH Formatter V5 and XSL Formatter V4 as listed below.

capitalize

For example, V4 formats the following

<fo:block text-transform="capitalize">
HELLO world!
</fo:block>

as follows.

Hello World!

AH Formatter V5 formats as follows.

HELLO World!

That is, although V4 changes the letters except the initial letter into lower case, AH Formatter V5 does nothing. In order to make it the same as V4, please specify as follows.

With AH Formatter V5, the initial value of otf-metrics-mode is changed from "windows" to "typographic". The baseline may slightly change depending on fonts. Especially, a difference will be clear with MORISAWA font.

text-justify-mode

AH Formatter V5 improves the processing of trimming a line of text. Although finer control was attained by axf:text-justify-trim with this enhancement, a difference may arise in the number of characters included in one line with XSL Formatter V4. When you want to make it the same as V4 by FO which does not use axf:text-justify-trim, please specify text-justify-mode="4" in the Option Setting File.

baseline-mode

AH Formatter V5 improves the processing when putting fonts with different baselines like a mixture of Western and Japanese text. For example,

like the above, you may specify font-family="'Times New Roman', 'MS Mincho'"
so that Japanese fonts are not applied to Latin. Since the first font specified as font-family determines a baseline by XSL Formatter V4 at this time, a difference may arise in the height of a line.
Since AH Formatter V5 selects the font in the font-family by the script or the language specification, a suitable baseline will be applied by specifying language="jpn" in the example above. When you want to make it the same as V4, please specify baseline-mode="4" in the Option Setting File.

Incompatibility of XSL1.0 and XSL1.1

Some incompatible changes from XSL1.0 are made to XSL1.1.

from-page-master-region()

In XSL1.1, even if writing-mode or reference-orientation are specified to fo:region-*, these are ignored and not effective.
In order to make these specifications effective in XSL1.1, it is necessary to specify the followings to fo:page-sequence.

In XSL1.0, fo:table is supposed to generate a reference area (see 5.6 in XSL1.0).
However, in XSL1.1, it was corrected that this was an error. The difference is mainly generated when converting from margin-* to start-indent and end-indent specified in fo:table. For example:

<fo:block margin-left="10pt">
<fo:table margin-left="0pt">
...

In the table like above, left margins may differ between XSL1.0 and XSL1.1. If start-indent etc. are used instead of margin-*, such incompatibility will not be generated.

Shorthand

Since the shorthand in the property of XSL has succeeded the definition of CSS, the value is evaluated like CSS. That is,

margin="0pt -10pt"

is evaluated as two values instead of one formula. However, when it's not a shorthand, this is evaluated as one formula. For example, the following is one formula.

margin-left="0pt -10pt"

AH Formatter V6.4 processes such an ambiguous expression by the shorthand as follows.

If the expression cannot be one formula like "0pt 10pt", then it is counted as two values.

If the mark and the numerical value have adhered like "0pt -10pt", it is counted as two values.

If a white space is included between a mark and a numerical value like "0pt - 10pt", it is counted as one formula.

"0pt-10pt" is an error. ( Refer to 5.9.5 Numerics in XSL specification)

In FO, when using a formula in the shorthand, it can be enclosed with parentheses, etc.

With CSS, when a function of calc() is written as calc(10pt-5pt), - is evaluated as a operator. It is because there is no description of whether to separate - from <length-unit> in calc() of the CSS3 specification. Syntactically, It is allowed to use <length-unit> with - in succession.

All values that are separated by a double ampersand && must appear in any order.

Greater than or equal to one of the values that are separated by a double bar || must appear in any order.

Exactly one of the values that are separated by a bar | must appear.

Brackets [ ] are for grouping the content.

Component value multipliers

An asterisk * indicates that the content appears greater than or equal to zero times.

A plus + indicates that the content appears greater than or equal to one times.

A question mark ? indicates that the content appears zero or one time.

{N} indicates that the content appears N times.

{N,} indicates that the content appears greater than or equal to N times.

{N,M} indicates that the content appears at least N and at most M times.

A hash mark # indicates that the content appears greater than or equal to one times, separated by comma.

Table Auto Layout

The table (fo:table) has the attribute, table-layout="fixed" and table-layout="auto".
The former specifies the fixed layout which
has the fixed column width, and the latter is a specification of the automatic layout which calculates the column width automatically.
When the value is omitted, the default value is table-layout="auto". In the XSL specification, the automatic layout serves as implementation-independent. We will explain the implementation of AH Formatter V6.4 in this document.

An automatic layout can take a lot of time for calculating the width of columns. Please specify table-layout="fixed" if high-speed formatting is desired.

In AH Formatter V6.4, the processing method of the table differs between the specification of table-layout and the specification of the width to fo:table. When the width of all columns is specified, even if table-layout="auto" is specified, it is treated as table-layout="fixed". Moreover, proportional-column-width() is supposed to be available to specify only in the case of table-layout="fixed" according to the XSL specification.
In AH Formatter V6.4, when a column with proportional-column-width() and a column without the width specification are intermingled, it is considered that column-width="proportional-column-width(1)" is specified to the column without the width specification. In addition, it is considered and processed that table-layout="fixed" is specified. That is, in such case, all columns will have the width specification.

table-layout

Width of fo:table

Processing Method

fixed

Yes

The width is divided equally and assigned to the column as which width is not specified. When the content exceeds the width, it will overflow.

No

The table width becomes 100%.
The width is divided equally and assigned to the column where the width is not specified.
When the content exceeds the width, it will overflow.

auto

Yes

The content of the column are calculated and the width is assigned to the column where the width is not specified. When the table width exceeds its specified width even if the minimum width of a column is adopted, the table width expands to the exceeded width.

No

The content of the column are calculated and the width is assigned to the column where the width is not specified.
When the table width does not fill to 100% even if the maximum width of a column is adopted, it will become the table width.
When the table width exceeds 100% even if the width of a column is adopted, it will become the table width.
Otherwise, the width of a table becomes 100%.

When table-layout="auto" is specified, the content of the column where the width is not specified are investigated.
More desirable column width can be determined if all rows are investigated, but it takes too much time for a big table.
AH Formatter V6.4 usually investigates the contents only to the column for 100 rows at the maximum and determines the width of a column.
This number of rows can be changed by table-auto-layout-limit of Option Setting File.

When table-layout="fixed" is specified, since the contents of the column are not investigated, the processing speed is always high.

URI

<uri-specification> in XSL specification is supposed to specify the character string which fulfills IRI (RFC3987) specification in url().
IRI is called URI for convenience in this document.
Schemes which can actually be specified in AH Formatter V6.4 are as follows:

http:

https: (Websites cannot be accessed if they have any problem with their certificates)

AH Formatter V6.4
allows specifying the file name on a local file system instead of URI for user's convenience.
However, generally there is no compatibility between URI and a local file name. For example, while a white space is not allowed for URI, a white space may be available for a local file name. Moreover, since the direct use of the % may be available to use, a character string called foo%20bar.png will point out a different resource between the two cases, evaluating as URI and evaluating as a local file name.

AH Formatter V6.4 solves this problem as follows:

When the scheme is specified, it is adopted as is.

When the scheme is not specified and surrounded by url(), it is processed as follows:

If URI is correct, it will be adopted as is.

If URI is incorrect, % escape processing is done.

When the scheme is not specified explicitly and specified barely, it is processed as follows:

In the Windows environment, \ is changed into /.

% escape processing is done.

The relative URI is combined with base-uri and transformed into the absolute URI. All local file names are transformed into a file scheme at this time. For example, in the Windows environment, when base-uri is C:\dir\, it is transformed as follows:

foobar.png

file:///C:/dir/foobar.png

url('foobar.png')

file:///C:/dir/foobar.png

url('url(foobar.png)')

file:///C:/dir/url(foobar.png)

subdir\foobar.png

file:///C:/dir/subdir/foobar.png

url('subdir\foobar.png')

file:///C:/dir/subdir%5Cfoobar.png

url('subdir/foobar.png')

file:///C:/dir/subdir/foobar.png

foo bar.png

file:///C:/dir/foo%20bar.png

url('foo bar.png')

file:///C:/dir/foo%20bar.png

foo%20bar.png

file:///C:/dir/foo%2520bar.png

url('foo%20bar.png')

file:///C:/dir/foo%20bar.png

foo%%20bar.png

file:///C:/dir/foo%25%2520bar.png

url('foo%%20bar.png')

file:///C:/dir/foo%25%2520bar.png

foo#bar.png

file:///C:/dir/foo#bar.png

url('foo#bar.png')

file:///C:/dir/foo#bar.png

foo%23bar.png

file:///C:/dir/foo%2523bar.png

url('foo%23bar.png')

file:///C:/dir/foo%23bar.png

A local file name cannot be written directly into url(). For example:

url('C:\My Document\foobar.png')

The string above will not operate as expected. Please specify a local file name without surrounding by url().

# is a separator of fragmentation.
In file:///C:/dir/foo#bar.png, the resource actually accessed is file:///C:/dir/foo.
Please specify url('foo%23bar.png') to access a resource called foo#bar.png.

UNC (Universal Naming Convention) in Windows, for example, \\host\My Document\foobar.png is transformed into file://host/My%20Document/foobar.png.
Also, //host/My Document/foobar.png will be transformed into http://host/My%20Document/foobar.png when base-uri is http:. (The same applies to https:). In non-Windows, file://host/... is not supported.

When accessing HTTP or HTTPS via a proxy in non-Windows environment, it's necessary to specify the proxy address by the environment variable.

When the root certificate is necessary in non-Windows environment, it's necessary to specify the directory of the root certificate by the environment variable.
V6.3MR2

Unicode

AH Formatter V6.4 supports Unicode 7.0. The characters added after that may not be treated correctly. In addition, it's impossible to treat the character of unsupported script correctly (☞ Scripts and Languages.) Also the following characters are not supported:

2066;LEFT-TO-RIGHT ISOLATE

2067;RIGHT-TO-LEFT ISOLATE

2068;FIRST STRONG ISOLATE

2069;POP DIRECTIONAL ISOLATE

U+2066 is considered as U+202D, U+2067 is considered as U+202E and U+2069 is considered as U+202C for each.

Line Breaking

Nonstarter Japanese characters defined in JIS X 4051:2004
can be controlled by axf:line-break.

Although LB30 in UAX#14 is a non line-breaking rule before the open-parenthesis and after the close-parenthesis. AH Formatter V6.4 permits the line breaking for full-width parentheses. The target objects are full-width open parenthesis, full-width close parenthesis, and full-width punctuation which are indicated in axf:punctuation-trim.

The line breaking class AI in a CJK script is processed as ID.
However, U+2015 (HORIZONTAL BAR) is processed as IN since it is non-breaking character in JIS X 4051:2004.

The line breaking class of half width kana is AL.
Unless it leaves a space between words as well as the alphabet, line breaking is not done.
AH Formatter V6.4 treats half width kana as full width kana and processes the line breaking.

UAX#14 allows a line break immediately after U+002F (SOLIDUS), then a line break occurs with abbreviations such as km/h and w/o. It is described clearly that such breaks are undesirable in UAX#14. AH Formatter V6.4 makes it possible to control the breaking of the word, such as abbreviations by axf:abbreviation-character-count.

The ideographic space (U+3000) is treated as a non-starter character. It was decided in consideration of the specification change by Unicode 6.3. If you don't want to treat it as a non-starter character,
please specify non-starter-ideographic-space="false" in the Option Setting File.

UAX#14 does not have descriptions on U+200C and U+200D. AH Formatter V6.4 processes line breaking as below:

Line breaking will not be done before and after U+200D.

Line breaking will be regarded as available before and after U+200C.

Hyphenation

This section explains the behavior of the page (or column) break when hyphenation-keep="page" (or "column") is specified. Suppose there is the following sentence with hyphenation-keep="page" specified.

AH Formatter V6.4 cannot dissolve the widows="3" caused by the side effect. This is the limitation of AH Formatter V6.4. widows="2" never cause such scenario.

Variation Sequence

AH Formatter V6.4 supports the Unicode Character 'Variation Sequence'.
When the OpenType font has the capability of Variation Sequence (cmap Format14), it is processed appropriately. For example, Variant Sequences can be expressed as follows.

2007-12-14
Combined registration of the Adobe-Japan1 collection and of sequences in that collection

&#xE0100;, etc. will be disregarded when it is a font which does not have the capability of Variation Sequence
or there is no corresponded variation characters, or the specified Variation Sequence is beyond the range. This indicates that even if the setting is the same, the displayed font face may differ depending on which Variation Sequence the font corresponds to.

CAUTION:

Variation Sequences other than Ideographic are not supported.

Font Selection

Fonts in FO or CSS are specified by the font-family property. There are various cases in settings when the candidates of the font are enumerated like font-family="'Courier New', serif", or when there is no specification of font-family, AH Formatter V6.4 determines which font should be applied to a character string as follows.

The character strings in the region are divide into the character strings with the same character by the script information corresponding to the character defined by Unicode, the language specified in FO or CSS, or the script information, etc. and the script of the divided character string is determined. This method of determination is complicated because of the reason that there contains the ambiguous characters to determine if it's a full width character or not in Unicode. Or the language is being unable to determine by kanji only as a character string.

When font-selection-mode="6" is specified in the Option Setting File, each character of this character string is investigated in order whether the font-family specified by FO or CSS has its glyph. Then the font with the first found glyph will be adopted. If these are not specified, each character of this character string is investigated whether the font-family specified by FO or CSS has its glyph, and the font-family supports the Unicode range or script in order. Then the first found supported font will be adopted. When no font-family is specified, it is considered that the generic font family as the standard font family is specified.

In XSL or CSS, the following five can be used as the generic font family.

Since there is no specification of cursive here, cursive in the default generic font is adopted to Hans. Like immediately after the installation, when <script-font script="Hans"/> itself is not specified, it is considered that the default group is specified. The following default group is set up with the Windows version. No scripts which are not specified here are set up. Moreover, it is not set up when the font does not actually exist.

Script

serif

sans-serif

cursive

fantasy

monospace

Standard

Times New Roman

Arial

Segeo Script orComic Sans MS orMonotype Corsiva

Impact

Courier New

Jpan

MS Mincho

MS Gothic

MS Mincho orMS Gothic

MS Mincho orMS Gothic

MS Gothic orMS Mincho

Hans

SimSun orMS Song

SimHei orMS Hei orMS Song

SimSun orMS Song

SimSun orMS Song

SimHei orMS Hei orMS Song

Hant

MingLiU

←

←

←

←

Hang

Batang orBatangChe

Gulim orBatangChe

Batang orBatangChe

Batang orBatangChe

BatangChe

Arab

Arabic Typesetting

←

←

←

←

Hebr

FrankRuehl

←

←

←

←

Deva

Mangal

←

←

←

←

Beng

Vrinda

←

←

←

←

Guru

Raavi

←

←

←

←

Gujr

Shruti

←

←

←

←

Taml

Latha

←

←

←

←

Telu

Gautami

←

←

←

←

Knda

Tunga

←

←

←

←

Mlym

Kartika

←

←

←

←

Sinh

Iskoola Pota

←

←

←

←

Thai

Angsana New

←

←

←

←

Khmr

DaunPenh

←

←

←

←

Laoo

DokChampa

←

←

←

←

Mymr

Myanmar Text

←

←

←

←

The following default group is set up with the Macintosh version.

Script

serif

sans-serif

cursive

fantasy

monospace

Standard

Times orTimes New Roman

Helvetica orArial

Monaco orChalkboard

Monaco orChalkboard

Courier

Jpan

HiraMinPro W3

HiraKakuPro W3

HiraMaruPro W3 orHiraKakuPro W3

HiraMaruPro W3 orHiraKakuPro W3

HiraKakuPro W3

Hans

STXihei

STSong

STXihei

STXihei

STSong

Hant

LiHeiPro

LiSongPro

LiHeiPro

LiHeiPro

LiSongPro

Hang

AppleMyungjo

AppleGothic

AppleMyungjo

AppleMyungjo

AppleGothic

Arab

Geeza Pro

←

←

←

←

Hebr

NewPeninimMT

←

←

←

←

Deva

DevanagariMT

←

←

←

←

Thai

Thonburi

←

←

←

←

The following default group is set up with the other UNIX version.

Script

serif

sans-serif

cursive

fantasy

monospace

Standard

Times

Helvetica

Times

Times

Courier

Upright Rendering of Text in Vertical Writing Mode

There are basically three types of the orientation of text in Japanese or Chinese documents as follows:

In horizontal writing

In vertical writing

SVO

MVO

Expresses the orientation of text in vertical writing mode with U or R. U is a character displayed upright on the paper. R is a character rotated 90 degrees clockwise on the paper. Then the text orientation in vertical writing mode is as follows:

Japanese characters like "漢字" are U.

Brackets are R.

After the glyph for vertical writing is used, punctuations are U.

European characters like "Abc" are U in SVO, R in MVO.

There is an argument of which characters should be upright or which characters should be rotated 90 degrees at UTR#50: Unicode Vertical Text Layout. Right now only the description of MVO (Mixed Vertical Orientation) is here in tr50-11.html. However, the description of SVO (Stacked Vertical Orientation) was also included in the past (tr50-6.html). AH Formatter V6.4 implements axf:text-orientation="mixed" complying with MVO, axf:text-orientation="upright" complying with SVO. However, AH Formatter V6.4 uses the one with some modifications. (☞ tr50-x.Orientation.txt). This data can be modified arbitrarily in the Option Setting File. See also UTR50.

Usually, the font supporting the vertical writing mode has the glyph for vertical writing for some characters.
It is because some are inapplicable to vertical writing simply by rotating the glyph for horizontal writing mode.
They are small kana, punctuations, long vowel, etc.
In vertical writing mode, if the character has the glyph for vertical writing, it will be used.

The orientation of text (U or R) is decided and expressed as compared to the orientation of the glyph for horizontal writing mode. However some glyphs for vertical writing mode differ from that for horizontal writing mode. The example below shows the glyph of U+3083, U+FF08, and U+2190. U+FF08 and U+2190 have the different orientation between vertical and horizontal writing mode.

Glyph for horizontal writing

Glyph for vertical writing

Although "brackets are R" as mentioned above, actually you have to display them as U using the glyph for vertical writing mode. That is, here is a tacit assumption that the glyph for vertical writing mode is designed to have the orientation differently from that for horizontal writing mode. Whether the font has the glyph for vertical writing mode or whether the orientation is the same as that for horizontal writing mode depends on the font. In particular, the difference by a font is remarkable in the orientation of symbols, such as arrows. Since it is impossible to get to know which orientation the glyph is designed, this problem is generally impossible to solve.
Therefore, AH Formatter V6.4 controls the orientation of the character according to the major implementations.

Formatting Large Document

For example, when formatting the simple FO without <fo:page-number-citation> and outputting PDF, since AH Formatter V6.4 outputs PDF by throwing away pages which has already been formatted, no matter how huge the document is, AH Formatter V6.4 can process without consuming the memory of greater than 1 page (except for the formatting from GUI). However, if the page refers to the back page by <fo:page-number-citation> we cannot know what page number the currently referenced page will be until the page is actually being formatted. For that reason, if the page containing the unsolved <fo:page-number-citation> appears, AH Formatter V6.4 will suspend the output, storing the result on the memory in the middle of formatting. When the document has a table of contents at the start, the output will not be performed until all the page number that appears in a table of contents is solved. A limit arises in the number of formatting pages and this means that the formatting of a large-scale document is impossible because of the memory consumption in large quantities.

In order to solve this problem, AH Formatter V6.4 makes it possible to process the document with 2-pass format. With the first pass, the formatting is processed only for the purpose of the solution of <fo:page-number-citation>, and all the required page number information is collected. With the second pass, the formatting starts again from the start of the page. Since all <fo:page-number-citation> is solved at this time, AH Formatter V6.4 can output the document by throwing away the already formatted pages. Although the formatting processing time will increase, most memories used for the formatting are not consumed and it is available to format the large-scale document. But it has no effect on the memory consumption needed for the output.

It's not available to process the 2-pass formatting with AH Formatter V6.4 Lite.

Temporary File

AH Formatter V6.4 does not make the temporary file for work except for the case of being inescapable.
Followings are the cases that AH Formatter V6.4 makes the temporary file for work.

With the COM interface, PDF of a formatted result is saved to a temporary file when outputting PDF to a Web browser directly.

An XML document passed by using DOM with the COM interface is processed using a temporary file. However, when FO is specified as the formatting type, the temporary file is not generated because DOM is processed directly.

When outputting a file while printing, a temporary file is generated.

When a file interface is required in the XSLT transformation using external XSLT and, a temporary file is generated.

When the transformation from XML+XSL is required in the render method of a Java interface, the result FO is generated as a temporary file.

In Windows version, when embedding the image that is not embeddable in PDF, a temporary file is generated in the conversion process.

A temporary file is generated when converting EPS to PDF using Distiller or Ghostscript.

When processing EPS using Distiller, if joboptions is not specified, default joboption will be generated as a temporary file.

When CGM Option is not installed, a temporary file is generated and rendered by using Windows plug-in.