Character sets and encodings

This topic discusses difference between characters, glyphs,
character sets and encodings, their impact on implementation.

Character vs. glyph vs. encoding

In fonts, there are glyphs that are the visual representations of
the related characters. Each character can be encoded in various ways,
for example, for character « A«, the code value in Latin script
depends on the encoding used (ISO 8859-1 or Windows code page 1252
or UTF-8 or UCS-2). The character « A« looks exactly the same
in Cyrillic and Greek scripts, but its code points are different from
the Latin one.

To clarify glyph: for the Latin letter A, the
following are examples of glyphs - A, a, A, a, a, A.

To clarify
encoding: the capital letter A belongs to Latin, Cyrillic and Greek
writing systems, but it has different code points for each script
depending on the encoding. For instance, the Unicode hex code for
this character can be: U+0041 (Latin capital letter A), U+0410 (Cyrillic
capital letter A) or U+0391 (Greek capital letter Alpha).

Character
repertoire varies between different languages even if they should
use the same writing system. Every single character may have different
encodings depending on the character encoding scheme used by the system.
This is important to note especially in information exchange as different
messaging and browsing applications follow different standards as
for the default encoding.