DOS codepages (and their history)

DOS has supported numerous character sets, also called codepages. This article documents official MS-DOS codepages and also Windows "OEM" codepages and some rare Arabic codepages.

From their very beginning, PCs have run with a 8-bit character set with 256 characters. Compared to 7-bit ASCII, which only offered 95 visible characters, this was a great advantage. PC software could easily support many Latin-based languages. As PCs became widely popular it turned out that 256 characters were not enough. More national characters were needed. Codepages were introduced to DOS in 1987 to meet this need.

This article is intended for computing experts who already know what character sets and codepages are. We look into MS-DOS and PC-DOS codepages, and also the codepages found in Windows command line mode ("DOS box"). We attempt to differentiate between "real" DOS codepages and "DOS-like" codepages. We compare codepages to other codepages. We point out differences in documented and actual behavior. We also document old Arabic codepages, for which no other online documentation existed as of 2014.

Contents

Standalone DOS codepages

The DOS operating system originally supported just one character set, or code page. That was the 437 codepage, also known as PC-ASCII. Later on, several alternatives were released as DOS went into widespread international use. That happened by the release of DOS 3.3 in 1987.

The following table summarizes the code pages officially supported by standalone versions of PC-DOS (IBM) and MS-DOS (Microsoft). The information is primarily based on MS-DOS versions from 3.3 to 6.22.

IBM: Year of first appearance in IBM registry (Graphic Character Sets and Code Pages).

* Support required a special language version of MS-DOS.

† "Windows ANSI and OEM" codepage (used both in DOS and Windows).

Additional OEM codepages

More codepages do exist. The following Microsoft-documented "OEM" codepages do not appear in any of the standalone PC-DOS or MS-DOS versions reviewed (up to MS-DOS 6.22 from 1994). Most of them seem to be supported by the command prompt under the Windows operating system.

* "Windows OEM" codepages are apparently supported by Windows command prompt.
** "Windows ANSI and OEM" codepages are supported by Windows. The same page is used in both Windows GUI and command prompt.
"First documented" refers to year when the earliest reference to the codepage has been found when writing this article.

Euro codepages (IBM)

The European Union introduced the euro currency symbol (€), which had consequences to codepages in 1998. Based on the existing DOS codepages, several updated and new codepages were defined. Either the new euro symbol was added to an unused slot in an existing page, or a new page was created where an old symbol was replaced by €.

The following table is based on documentation by IBM. Microsoft documentation does not mention any of these changes, except for codepages 858 and 874.

Note that the IBM and Microsoft Thai codepages are different despite similar numbering.

It remains unclear which systems actually supported any of these euro updated codepages.

DOS codepage charts

The following codepage charts list all official Latin-based DOS codepages, and also Greek, Hebrew and Cyrillic. Arabic codepages appear in their own chapter. 874 Thai and 1258 Vietnamese are presented among other Windows codepages.

Asian double-byte codepages are missing due to technical reasons. Unless otherwise mentioned, the codepages are screenshots that have been captured in MS-DOS.

Common area (00-7F)

Characters 00-7F (hex) in the following chart are common to all DOS codepages listed here.

The chart is similar to ASCII except for control characters. Codepoints 00-1F and 7F (hex), marked with pink, have a dual nature. They can be used both as invisible ASCII control characters and displayed on the screen. Because of this, DOS codepages are downwards compatible with ASCII.

Exception: Codepage 864 Arabic is different from all others DOS codepages. It supports different symbols in the control character area. We are not going any further into that in this article.

Codepage 437 United States

Codepage 437 is the original IBM "PC-ASCII" codepage. Other codepages differ in the 80-FF (hex) range.

437

In the following charts, differences to 437 are highlighted in green. Click the images to compare to 437.

Codepage 737 Greek II

Alternative names: 437 G, MS-DOS Greek, OEM Greek

737 - Click to swap

This codepage has formerly been known as 437G.

Codepage 775 Baltic Rim

Alternative names: MS-DOS Baltic Rim, OEM Baltic

775 - Click to swap

(captured in Windows 2000 SP4)

Codepage 775 is not a DOS codepage in the strictest sense. It never appeared in standalone MS-DOS. The page covers Estonian, Lithuanian and Latvian (and even Polish). It conforms to Lithuanian Standard LST 1590-1.

Codepage 869 Greek

According to IBM, a euro version exists where € was added to unused position hex 87.

Arabic codepages

Arabic codepages are inadequately documented in online sources. The codepages are not well supported by English versions of either MS-DOS or Windows. Documentation is lacking or inaccurate. As per our knowledge, codepages 709, 710 and 711 have not been documented online prior to this article in 2014.

The following Arabic codepages have been captured in Arabic Windows 98 Second Edition (Arabic command line). Arabic command line means there is a special built-in utility in Windows that adds Arabic script support, such as right-to-left writing and joining of Arabic letters.

Online documentation, published by Microsoft, exists for codepages 708, 720 and 864. A multitude of characters appear to differ in Windows 98, however. Differences between Windows 98 and Microsoft documentation have been highlighted in pink. These differences may be due to Windows 98.

Codepage 708 Arabic (ASMO 708)

708 - Click to swap

(captured in Arabic Windows 98 SE)

Codepage 708 in Arabic Windows 98 SE differs from Microsoft documentation of 708 from 1995. The differences are in pink. Line drawing characters with double lines appear as single lines. Several other characters look different as well.

Codepage 708 is downwards compatible with standards ASMO 708 (1988) and ISO 8859-6 (Arabic). Codepage 708 adds characters to positions unused in the standards (for comparison see the ASMO 708 set in ISO-IR 127 and ECMA-114). A reference to codepage 708 appears in the RTF file format specification, where it was added during 1989–1993.

Codepage 709 Arabic (ASMO 449+, BCON V4)

709 - Click to swap

(captured in Arabic Windows 98 SE)

Codepage 709 is inadequately documented in online sources. A reference to it appears in the RTF file format specification, where it was added during 1989–1993.

Codepage 709 appears to have been built on the ASMO 449 standard. ASMO 449 is a 7-bit ASCII-like encoding (see ISO-IR 089) that has Arabic letters in place of letters A-Z, and also some symbols. Codepage 709 has lifted ASMO 449 characters to the area 80-FF (hex) and added some extra characters to unused positions. The tilde (~) at FE (hex) is incompatible with ASMO 449, though. See also: ASMO 449+.

Codepage 709 is quite similar to 708 what comes to Arabic letters and Arabic symbols, but not what comes to Latin letters, ASCII symbols and digits.

Codepage 710 Transparent Arabic

710 - Click to swap

(captured in Arabic Windows 98 SE)

Codepage 710 was introduced in Arabic MS-DOS 3.3. It is inadequately documented in online sources.

Codepage 711 Arabic (Nafitha Enhanced)

711 - Click to swap

(captured in Arabic Windows 98 SE)

Codepage 711 is inadequately documented in online sources. A reference to it appears in the RTF file format specification, where it was added during 1989–1993. Nafitha was a program that added Arabic support to DOS.

Codepages 710 and 711 are somewhat similar, but not compatible with each other.

Codepage 720 Arabic (Transparent ASMO)

Alternative name: MS-DOS Arabic (Transparent ASMO)

720 - Click to swap

(captured in Arabic Windows 98 SE)

Codepage 720 in Arabic Windows 98 SE differs from Microsoft documentation of 720. The differences are in pink. Line drawing characters with double lines appear as single lines. Several other characters look different as well.

Codepage 720 was added to MS-DOS 6.22 (1994). A reference to it appears in the RTF file format specification, where it was added during 1989–1993.

Codepage 864 Arabic (MS-DOS)

Alternative names: Arabic - Personal Computer, OEM Arabic

864 - Click to swap

(captured in Arabic Windows 98 SE)

Codepage 864 in Arabic Windows 98 SE differs from Microsoft documentation of 864 from 1996. The differences are in pink. The 1996 documented version, which is a conversion table, supports more characters than the ones actually implemented in Windows 98.

According to MS-DOS 6.22, this was the only Arabic codepage available.