Hyphenation

AH Formatter V6.5 can hyphenate over 40 languages. There is no need to prepare the dictionary.

Languages

AH Formatter V6.5 supports the hyphenation for the following languages.

Code

Language

Hyphenation Limited To

af

afr

Afrikaans

Latin characters and Apostrophe

bg

bul

Bulgarian

Cyrillic characters

ca

cat

Catalan

Latin characters and Apostrophe and Decimal point (Full stop or Middle dot)

cs

ces

Czech

Latin characters

cy

cym

Welsh

Latin characters and Apostrophe

da

dan

Danish

Latin characters and Apostrophe

de

deu

German / Swiss German

Latin characters and Apostrophe

el

ell

Greek

Greek characters

en

eng

English

Latin characters and Apostrophe

en-US

eng-US

American

Latin characters and Apostrophe

eo

epo

Esperanto

Latin characters

es

spa

Spanish

Latin characters

et

est

Estonian

Latin characters

eu

eus

Basque

Latin characters

fi

fin

Finnish

Latin characters

fr

fra

French / Canadian French

Latin characters and Apostrophe

ga

gle

Irish (Erse or Gaelic)

Latin characters and Apostrophe

hr

hrv

Croatian

Cyrillic characters or Latin characters

hu

hun

Hungarian

Latin characters

id

ind

Indonesian

Latin characters and Apostrophe and Digit 2

is

isl

Icelandic

Latin characters

it

ita

Italian

Latin characters and Apostrophe

la

lat

Latin

Latin characters

lt

lit

Lithuanian

Latin characters

lv

lav

Latvian

Latin characters

ms

msa

Bahasa Malay

Latin characters and Apostrophe and Digit 2

mt

mlt

Maltese

Latin characters and Apostrophe

nb

nob

Norwegian (Bokmål)

Latin characters and Apostrophe

nl

nld

Dutch / Flemish

Latin characters and Apostrophe

nn

nno

Norwegian (Nynorsk)

Latin characters and Apostrophe

no

nor

Norwegian

Latin characters and Apostrophe

pl

pol

Polish

Latin characters

pt

por

Portuguese / Brazilian

Latin characters

ro

ron

Romanian / Moldavian

Latin characters and Apostrophe

ru

rus

Russian

Cyrillic characters

sk

slk

Slovak

Latin characters and Apostrophe

sl

slv

Slovenian

Latin characters and Apostrophe

sr

srp

Serbian

Cyrillic characters or Latin characters

sv

swe

Swedish

Latin characters and Apostrophe

sw

swa

Swahili

Latin characters and Apostrophe

th

tha

Thai

Thai characters

tr

tur

Turkish

Latin characters

uk

ukr

Ukrainian

Cyrillic characters

AH Formatter V6.5 hyphenates a word considering the character string composed of characters listed in the table above to be a word. If a word contains the other characters, it is not considered to be a word. If you need hyphenation for unsupported characters you will need to use a TeX dictionary.

Exception Dictionary

It's not necessary to prepare the dictionary with AH Formatter V6.5. However, there may be a case that you want to treat the unexpected hyphened words as exceptions. In such case, it is possible to register the words in the exception dictionary.
In addition, when you edit the exception dictionary while working on GUI, you can re-load the hyphenation dictionary and re-format the document from [menu] - [Format] - [Reload Hyphenation Dictionary].

The exception dictionary is stored in the hyphenation folder in the AH Formatter V6.5 installation folder or in the folder where the AHF65_HYPDIC_PATH (AHF65_64_HYPDIC_PATH for 64bit version) environment variable indicates. The name of the dictionary file conforms to the following rules, which are the same as TeX dictionary.

The file name is made from the Language Tag defined in RFC1766. To make a file name a hyphen is changed to an under bar and the ".xml" extension is added. The Language Tag is made by joining the language code of ISO 639-2 and the country code of ISO 3166 with a hyphen.
Sometimes it consists of only the language code. Please be sure that an under bar used in the file name and not a hyphen.

The language code should be specified by 2-letter code when it exists, and if not, specify it by Terminology code. Also specify the country code by 2-letter code when it exists.

For example: de.xml, en_US.xml and so on. When xml:lang="nl-BE" is specified, dictionaries are detected in the following order.

nl-BE.xml

nl_BE.xml

nl.xml

The following shows the content of exception dictionary.

Element

Location

Description

<hyphenation-info>

root element

<hyphen-char>

child of <hyphenation-info>

The element that indicates the hyphenation character alternative to <hyphen/> in the <exception> element. Hyphenation character is expressed by the value attribute. The initial value is "-" (U+002D).

<exceptions>

child of <hyphenation-info>

A data of exception dictionary. The text of the <exception> element is a collection of hyphened words divided by white space. The hyphen information is indicated by the <hyphen> element, however the character specified by the <hyphen-char> element can also be used.

<hyphen>

child of <exceptions>

A full functional hyphen equivalent to TeX discretionary. <hyphen> element has the pre, post and no attributes. The pre attribute indicates the strings inserted before the hyphenation character when a hyphenation break occurs, The post attribute indicates the strings inserted after the hyphenation character when a hyphenation break occurs, the no attribute indicates the strings appearing when a hyphenation break does not occur. <hyphen> element is used when the spelling changes when a hyphenation break occurs.

<non-eol-words>

child of <hyphenation-info>

Specifies non-end-of-line words dividing by the white space. The word specified here is adjusted not to be placed at the end of line, however in some case it's inevitable. The non-end-of-line process is effective all the time, independent of the hyphenate property in FO.

The word ‘table’ will be hyphenated only as ‘ta-ble’; the word ‘present’ will never be hyphenated; and the word ‘backen’ will be hyphenated as ‘bak-ken’. Also, ‘ta<hyphen/>ble’ is equivalent to ‘ta-ble’ in this example.

The <hyphen> element can change the spelling of a word when it is hyphenated.

Exception Dictionary

Word

Hyphenated Word

ab<hyphen/>def

abdef

ab-def

ab<hyphen no="c"/>def

abcdef

ab-def

ab<hyphen pre="x"/>def

abdef

abx-def

ab<hyphen pre="x" no="c"/>def

abcdef

abx-def

ab<hyphen post="z"/>def

abdef

ab-zdef

ab<hyphen no="c" post="z"/>def

abcdef

ab-zdef

ab<hyphen pre="x" post="z"/>def

abdef

abx-zdef

ab<hyphen pre="x" no="c" post="z"/>def

abcdef

abx-zdef

The exception dictionary is available with the following languages:

Code

Language

Hyphenation Limited To

km

khm

Khmer V6.5

Khmer characters

lo

lao

Lao V6.5

Lao characters

my

mya

Burmese (Myanmar) V6.5

Burmese characters

th

tha

Thai

Thai characters

With these languages, the exception dictionary is not used for hyphenation but to specify the words that are prohibited from breaking. Each word can contain only the characters making up the word. Neither hyphen characters nor <hyphen> can be used in <exceptions>.

TeX Dictionary

It's also available to do hyphenate using the TeX dictionary with AH Formatter V6.5. To hyphenate by Tex dictionary, it's necessary to specify HyphenationOption="false" in the Option Setting File. Dictionaries will be required for all the necessary languages. Dictionaries are XML files that are the same format as FOP. See also the Apache Website. Only the hyphenation dictionary for English (en.xml) is ready and provided with XSL Formatter V4.0.

The contents of TeX's Hyphenation Dictionary are defined in the hyphenation.dtd. hyphenation.dtd is included in FOP distribution. In AH Formatter V6.5, it is installed in the hyphenation folder where AH Formatter V6.5 is installed. Below is a brief explanation of the DTD. See hyphenation.dtd for more details.

Element

Location

Description

<hyphenation-info>

root element

<hyphen-char>

child of <hyphenation-info>

This element expresses hyphenation characters in the exception dictionary data. Hyphenation character is expressed by the value attribute. Initial value is "-" (U+002D). But the hyphenation characters in the actual formatted result are given by the hyphenation-character property in the XSL specification.

<hyphen-min>

child of <hyphenation-info>

When hyphenation break occurs, before and after attributes give the minimum number of characters in a hyphenated word before or after the hyphenation character.
before attribute is mapped to XSL hyphenation-remain-character-count property, after is mapped to XSL hyphenation-push-character-count. AH Formatter V6.5 uses these properties and the hyphen-min element in the dictionary is ignored.

<classes>

child of <hyphenation-info>

Defined as character equivalent class. Text of classes' element is white space-separated list of character groups, all characters in a group are to be treated equivalent. Actually each group consists of lowercase and uppercase characters. Following is a sample of English dictionary (en.xml).

The hyphenation patterns, space separated. A pattern consists of character and digits. Character is the beginning characters of classes groups (normally lowercase). Digits between characters indicate the strength of hyphenation potential (hyphenation value).

<exceptions>

child of <hyphenation-info>

Data of hyphenation exception dictionary. Text of exceptions element consists of space-separated list of hyphenated words. A hyphen is indicated by the hyphen element, but you can use character defined in hyphen-char element. Exceptions element is used when hyphenation points determined by hyphenation-pattern dictionary are not appropriate or you want to use special hyphenation patterns of your own.

<hyphen>

child of <exceptions>

A full functional hyphen. Hyphen element has the pre, post and no attributes. The pre attribute indicates the strings inserted before the hyphenation character when a hyphenation break occurs, The post attribute indicates the strings inserted after the hyphenation character when a hyphenation break occurs, the no attribute indicates the strings appearing when a hyphenation break does not occur. Hyphen element is used when the spelling changes when a hyphenation break occurs.

Restrictions

If the sentence is placed in the narrow region and there occurs plural hyphenation for one word, sometimes the result does not follow the exception dictionary.
See also Hyphenation in Technical Notes.