Ellipsis Patterns

Ellipsis patterns are used in a display when the text is too long to be shown. It will be used in environments where there is very little space, so it should be just one character; where that really can't work, it should be as short as possible.

There are three different possible patterns that need to be translated. Typically the same character is used in all three, but three choices are provided just in case different characters would be appropriate in different contexts, for some languages.

English Pattern

English Example

Meaning

{0}…or
{FIRST_PART_OF_TEXT}…

The quick brown f...

The end of the string is being truncated.

{0}…{1}or
{FIRST_PART_OF_TEXT}…{LAST_PART_OF_TEXT}

The quic…azy dog.

The middle of the string is being truncated.

…{1}or…{

LAST_PART_OF_TEXT

}

…ver the lazy dog.

The start of the string is being truncated.

English uses the same basic text for all three cases, and just changes the placeholders. An example of where a language might use different characters is where a space should come between the placeholder and the elipsis. In that case, the patterns would be as in the second column below.

English Pattern

With Spaces

{0}…

{0} …

{0}…{1}

{0} … {1}

…{1}

… {1}

English uses the elipsis character (Unicode U+2026), which is preferred over three periods in a row. The latter may have a different appearance, as in the following table.

Ellipsis Character

…

Three dots (periods/full-stops)

...

If your language also uses three dots to indicate that some text is being elided, then you should also use the elipsis character unless three separate dots are strongly preferred.

More Information Character

This character will appear where the user will click on it to get more information. It will be used in environments where there is very little space, so it should be just one character; where that really can't work, it should be as short as possible.

The English value is “?”, but another character might be better for your language.

Delimiters

The delimiters are the characters used for quoting text. For example, for English they are the “curly” right and left forms as in “this phrase.” The alternate forms are for embedded quotations, such as “He yelled ‘Stop!’, and turned around.”

BIDI languages (Arabic, Hebrew,…):

“Start” means the character that starts the quotation, and “end” the one that finishes it. With most languages, the start quotation will appear on the left, while with BIDI languages, it will appear on the right.

Valid Delimiters

Currently the CLDR survey tool checks input delimiters against a predefined set of possibilities. The following delimiters are considered "valid" by the CLDR survey tool.

Exemplar Characters

The exemplar character sets contain the commonly used letters for a given modern form of a language. These are used for testing and for determining the appropriate repertoire of letters for various tasks, like choosing charset converters that can handle a given language. The term “letter” is interpreted broadly, and includes characters used to form words, such as 是 or 가. It should not include presentation forms, like U+FE90 ( ‎ﺐ‎ ) ARABIC LETTER BEH FINAL FORM, or isolated Jamo characters (for Hangul).

There are different categories:

Category

English Example

Meaning

standard

a b c d e f g h i j k l m n o p q r s t u v w x y z

The minimal characters required for your language (other than punctuation).

The test to see whether or not a letter belongs in the main set is based on whether it is acceptable in your language to always use spellings that avoid that character. For example, English characters do not contain the accented letters that are sometimes seen in words like résumé or naïve, because it is acceptable in common practice to spell those words without the accents.

If your language has both upper and lowercase letters, only include the lowercase (and İ for Turkish and similar languages).

Additional letters and punctuation (beyond the minimal set) used in foreign or technical words found in typical magazines, newspapers, &c.

For example, you could see the name Schröder in English in a magazine, so ö is in the set. However, it is very uncommon to see ł, so that isn't in the auxiliary set for English. Publication style guides, such as The Economist Style Guide for English, are useful for this.

If your language has both upper and lowercase letters, only include the lowercase (and İ for Turkish and similar languages).

index

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

The “shortcut” letters for quickly jumping to sections of a sorted, indexed list (for an example, see mu.edu).
The choice of letters should be appropriate for your language. Unlike the minimal or additional characters, it should have either uppercase or lowercase, depending on what is typical for your language (typically uppercase).

Any range of characters, such as “a b c d e” can be represented compactly as “a-e”.

If you see an escape sequence such as "\u0301" in one of the exemplar sets in your language, this indicates a non-spacing character (diacritic) or control character. You can use the utility at http://unicode.org/cldr/utility/list-unicodeset.jsp to help you determine the meanings of such sequences.

Handing Warnings

There are two kinds of warnings you can get with Exemplar Characters. While these are categorized as warnings, every effort should be made to fix them.

A. A particular translated item contains characters that aren't in the exemplars.

For example:

Suppose the currency code XAF is translated as "Φράγκο BEAC CFA" in Greek. That raises a warning because the "BEAC CFA" are not in the Greek exemplars.

Suppose that a currency symbol contains ৲ (BENGALI RUPEE MARK). That also raises a warning, even though it is a symbol and not a letter, because it has a script (Bengali).

There are three possible remedies:

If the character really is used in the language, add it to the appropriate exemplar set (standard, auxiliary,…).

For example, the Bengali Rupee mark should be added to the currency exemplar set.

To add to the Exemplar Characters, go first to the main view for your locale, then select Other Items [Characters]. For example, see German characters.

If the character is part of a 'gloss', that is, it is parenthetically included for reference, and the gloss is all ASCII, then include it in brackets. You can use [square brackets] or (parentheses) in currencies. Everywhere else, please use only square brackets.

So the XAF above can be fixed by changing it to "Φράγκο [BEAC CFA]" or "Φράγκο (BEAC CFA)". For the timezone name "ACT (Ακρ)", the fix is to change to "Ακρ [ACT]".

If neither of these approaches is appropriate, try rephrasing the translated item to avoid the character.

If it really can't be avoided, then please file a new ticket describing the problem.

B. The exemplar characters shouldn't contain a particular character.

The standard characters shouldn't contain punctuation. They also should not contain symbols, unless those symbols are only used with the language's writing system (aka script). For example, the standard Bengali currency symbols should contain the Bengali Rupee mark (which is Bengali-only), but should not include the $ Dollar Sign (which is common across all scripts).