The Modular Manual Browser

code_page(5) code_page(5)
NAME
code_page, cp437, cp737, cp775, cp850, cp852, cp855, cp857, cp860, cp861,
cp862, cp863, cp865, cp866, cp869, cp874, cp932, cp936, cp949, cp950,
cp1250, cp1251, cp1252, cp1253, cp1254, cp1255, cp1256, cp1257, cp1258,
dingbats, symbol - Coded character sets that are used on Microsoft Windows
and NT systems
DESCRIPTION
Code pages are coded character sets that are used on Microsoft Windows,
Windows 95, and NT systems. Just as there are different UNIX codesets,
there are different PC code pages, each supporting a particular set of
character encodings.
A Tru64 UNIX system supplies one locale, en_US.cp850, that directly sup-
ports a PC code-page format (MS-DOS Latin 1). For all other locales, data
in code-page format is supported only through codeset converters. These
converters can be run directly by users or by software or applications that
exchange data between PC and Tru64 UNIX systems. Fonts and other kinds of
character support are available only for the native UNIX codeset to which a
code page can be converted. See the i18n_intro(5) reference page for intro-
ductory information on locales and codesets. See the iconv_intro(5) refer-
ence page for an introduction to codeset conversion and the name format and
location of codeset converters.
The following table lists and describes the code pages that have conversion
support on a Tru64 UNIX system. An asterisk (*) follows the names of code
pages that include support for the Euro currency sign (C=).
______________________________________________________________
Code Page Description
______________________________________________________________
cp437 MS-DOS United States
cp737 Greek
cp775 Baltic languages (1)
cp850 MS-DOS Multilingual (Latin-1)
cp852 MS-DOS Slavic (Latin-2)
cp855 IBM Cyrillic
cp857 IBM Turkish
cp860 MS-DOS Portuguese
cp861 MS-DOS Icelandic
cp862 Hebrew
cp863 MS-DOS Canadian French
cp865 MS-DOS Nordic languages
cp866 MS-DOS Russian
cp869 IBM Modern Greek
cp874 * MS-DOS Thai
cp932 Japanese
cp936 Chinese (People's Republic of China)
cp949 Korean
cp950 Chinese (Hong Kong)
Windows Latin-2
cp1250 *
Windows Cyrillic
cp1251 *
Windows Latin-1
cp1252 *
Windows Greek
cp1253 *
Windows Turkish
cp1254 *
Windows Hebrew
cp1255 *
Windows Arabic
cp1256 *
Windows Baltic (1)
cp1257 *
Windows Vietnamese
cp1258 *
dingbats Microsoft dingbat characters
symbol Microsoft miscellaneous symbol characters
______________________________________________________________
(1) Baltic languages include Estonian, Latvian, and Lithuanian.
(2) Latin-2 languages include Albanian, Croatian, Czech, Faeroese, Hun-
garian, Polish, Romanian, Latin Serbian, Slovak, and Slovenian.
(3) Cyrillic languages include Byelorussian, Bulgarian, and Russian.
In all cases, a code page can be converted to and from the UCS-2, UCS-4,
and UTF-8 codesets. In addition, some code pages can be converted directly
to ISO codesets as shown in the following table, although some data loss
may occur.
_________________________________________
Code Page Can Be Converted Directly to:
_________________________________________
cp437ISO8859-1cp737ISO8859-7cp775ISO8859-4cp850ISO8859-1cp852ISO8859-2cp855ISO8859-5cp857ISO8859-9cp860ISO8859-1cp861ISO8859-1cp862ISO8859-8cp863ISO8859-1cp865ISO8859-1cp866ISO8859-5cp869ISO8859-7cp874TACTIScp1252ISO8859-1, ISO8859-15
_________________________________________
See Unicode(5) for information about UCS-2, UCS-4, and UTF-8. Reference
pages for UNIX implementations of the ISO codesets have the name format
iso8859-number(5).
For Traditional Chinese and Japanese, there are no codeset converters whose
names include the name of a code page because identical character encoding
is provided in existing UNIX codesets. For Traditional Chinese, character
encoding in PC code-page format (cp950) is identical to that in the Big-5
(big5) codeset. For Japanese, character encoding in PC code-page format
(cp932) is identical to that in the Shift JIS (SJIS) codeset. Therefore,
the codeset converters whose names include big5 and SJIS can be used to
convert data in and out of PC code-page format for the supported languages.
Caution for Conversion of Korean and Simplified Chinese
Conversion of text that starts out in code-page format (cp949) to the
DEC Korean (deckorean) codeset may result in loss of data. All of the
Tru64 UNIX codeset equivalents for cp949 support all the Hanja and
miscellaneous characters also supported by the code page. However,
only the UCS-2, UCS-4, and UTF-8 codesets support the complete set of
Hangul characters supported by the cp949 code page. The deckorean
codeset supports only a subset of these Hangul characters. Therefore,
if data is converted from cp949 format to UCS-2, UCS-4, or UTF-8, no
data is lost. However, if the data is then converted from UCS-2, UCS-4, or UTF-8 to deckorean, the unsupported Hangul characters will be
lost.
The DEC Hanzi (dechanzi) codeset uses the same encoding format as the
PC code page used for Simplified Chinese (cp936) but does not support
all the characters supported by the code page. Therefore, you can use
converters with dechanzi in the converter name to convert text to and
from cp936 format, but the operation may result in some loss of data.
SEE ALSO
Commands: iconv(1)
Functions: iconv(3), iconv_close(3), iconv_open(3)
Others: i18n_intro(5), iconv_intro(5), iso8859-1(5), iso8859-2(5),
iso8859-4(5), iso8859-5(5), iso8859-7(5), iso8859-8(5), iso8859-15(5),
Unicode(5)