Control characters in ASCII and Unicode

Tens of odd control characters appear in ASCII charts. The same characters have found their way to Unicode as well. CR, LF, ESC, CAN... what are all these codes for? Should I care about them? This is an in-depth look into control characters in ASCII and its descendants, including Unicode, ANSI and ISO standards.

When ASCII first appeared in the 1960s, control characters were an essential part of the new character set. Since then, many new character sets and standards have been published. Computing is not the same either. What happened to the control characters? Are they still used and if yes, for what?

This article looks back at the history of character sets while keeping an eye on modern use. The information is based on a number of standards released by ANSI, ISO, ECMA and The Unicode Consortium, as well as industry practice. In many cases, the standards define one use for a character, but common practice is different. Some characters are used contrary to the standards. In addition, certain characters were originally defined in an ambiguous or loose way, which has resulted in confusion in their use.

The above clickable table summarizes the control characters. The character codes are given in hexadecimal. Color coding indicates character category. Click a character to jump to more information on it.

Groups of control characters

For the purposes of this document, control characters are divided into three groups.

1. ASCII control characters. The ASCII control character area covers code positions 0–31 (hex 00–1F). This area is also called the C0 set. Two additional controls appear at 32 and 127 (hex 20 and 7F). The ASCII control characters cover a wide range of uses, such as text layout, transmission and device control, and more. More

2. C1 control characters. C1 covers positions 128-159 (hex 80-9F). C1 is primarily for displays and printers. This set is related to ANSI escape sequences and VT100. More

3. ISO 8859 special characters. Two special characters, NBSP and SHY, are from ISO 8859. They are also used in Windows and Unicode. They appear at 160 and 173 (hex A0 and AD). More

Note: These control character sets are not the only control characters ever used. Other C0 and C1 sets do exist. Alternative sets were defined for special uses. In them, a part of the standard C0/C1 controls have been deleted or replaced by new controls. Even totally different alternative sets exist. Alternative control characters are not discussed in this article. One can find them in the International Register of Coded Character Sets.

Control characters in standards

ASCII control characters

C0 = positions 0–31. Origin with ASCII and ISO 646 character sets. Characters SP and DEL appear together with C0.

The first group of control characters originates from ASCII. These characters consist of a set called C0 and two additional characters. The C0 set is in locations 0 to 31. Two additional ASCII characters, SP and DEL, fall outside the C0 area, but they are closely related to the C0 set. All of these characters are defined by the same standards.

This set of control characters covers many uses. There are "Format Effectors" that control the appearance of plain text. There are "Transmission Controls" for use with transmission protocols and "Device Controls" to start, operate and stop auxiliary devices. There are "Information Separators" that delimit various pieces of data. Other controls exist for producing alerts, filling a media, indicating end of media, and for dealing with errors. There are even controls to create new characters and controls. The C0 set was defined with perforated tape, punched cards and typewriter-like devices in mind. Devices have changed since then, but the C0 controls have survived.

History of ASCII control characters

The first version of ASCII was released in 1963. Like the ASCII of today, the 1963 version covered some letters and symbols, as well as control characters. While many of those 35 control characters were similar to those of modern ASCII, some were different. ASCII-1963 had some serious shortcomings, such as no support for lower case letters. It quickly turned out that the standard must be revised. Today, ASCII-1963 is practically forgotten. Since ASCII-1963 deviates a lot from later ASCII versions in the control character area too, we will not go any deeper into it.

The next revision was ASCII-1965. This version, although formally accepted, was not published. Another revision was going to take place. ASCII as we know it is based on the ASCII-1967 standard (USAS X3.4-1967). This version was an important milestone. It was already very close to the version that then became widely used.

In 1968 ASCII was slightly updated and released as USAS X3.4-1968 (later retronamed as ANSI X3.4-1968). The actual updates were very small, only adding an option to use the character LF as a "newline", and designating ASCII and USASCII as the names of the standard. (Later on, the name USASCII was dropped, leaving ASCII as the official name.)

ASCII-1968 became immensely popular. Almost all of today's computer systems use ASCII or one of its descendants. (A notable exception is EBCDIC used on IBM mainframes, very different from ASCII.) The Internet is based on ASCII-1968 as well.

ASCII-1968 defined the 34 control characters that remained: the C0 set, SP and DEL. Included was a short description of the intended functionality of each control character. These definitions also made themselves to RFC 20 word for word. Most of these definitions have remained materially unchanged for decades. Later standards have updated the text, but the basic functionality is still the same. This is what comes to standards. Non-standard use is common and often contrary to the standards.

When ASCII emerged, computing equipment was quite different from the equipment that ASCII was going to be popularized with. Computers were regularly operated through punched cards, perforated tape and teletypewriters (TTYs). TTYs were typewriter-like devices, which were used as interactive computer terminals. Instead of a monitor they produced output on paper. The ASCII control characters were naturally designed considering the devices of those days. Since then, new devices such as monitors have emerged. It hasn't always been that simple to accommodate the control characters to the newer devices. Despite the challenges, the control characters of the 1960s are still with us.

ISO 646. ASCII evolved to an ISO standard, which is known as ISO 646. The first version came out in 1967. ISO 646 is the "international edition" of ASCII, with a few differences. Despite the differences, these standards were closely related. ISO 646 allowed national variants to support the national characters required for each country. The US national variant was ASCII. Several other national variants were released to support accented letters (à, ü and the like) and other symbols. The ISO variants including ASCII were a common way to express text in the 1970s and 1980s.

As to the control characters, the ASCII control characters set also appeared in ISO 646. The functionality of the control characters remained quite intact, even though the definitions were updated.

More standards. ISO 646 was also released as ECMA-6. The control characters appear in ECMA-6 very similar to those of ISO 646.

A part of the C0 codes were further refined in other standards. SI, SO and ESC appeared as character set extension controls in ANSI X3.41, ISO 2022 and ECMA-35. These characters became widely used to invoke additional character sets. The Transmission Control characters (T1 to T10) appeared as ISO 1745 in 1975, which gave detailed description of where and how they should be used. How widely ISO 1745 was actually used in transmission is another question.

Current status of ASCII control characters

ASCII was later updated in 1977 and again in 1986 to be in conformance with ISO 646. The control characters in ASCII-1986 and ISO 646/ECMA-6 are very similar, even though minor differences do exist.

The current ISO and ECMA versions, namely ISO 646:1991 and ECMA-6:1991, no longer define the C0 control characters. The control characters didn't go away, however. They now appear in ISO/IEC 6429:1992 and ECMA-48:1991, respectively. Simply put, the C0 set was lumped together with other control characters, the C1 set, which follows below.

As to some specific control characters, the current detailed definitions of SI, SO and ESC can be found in ANSI X3.41, ISO 2022 and ECMA-35. The current details for the Transmission Control characters (TC1 to TC10) appear in the old ISO 1745 from 1975.

Even though the history of the various standards related to the ASCII control codes may sound unnecessarily complicated, the standard functionality of the characters has not changed dramatically. It's still mostly the same as back in 1967. This is what comes to standards. The practice is totally different. Some control characters are indeed commonly used the standard way. On the other hand, many are used contrary to the standards, or simply ignored. It's not uncommon to find control characters forbidden in data. Control characters can have unwanted or unknown side-effects. The easiest way for programmers to deal with them is to shut their eyes or deny such characters altogether.

C1 control characters

C1 = positions 128–159. Primarily for displays and printers.

The C1 set appeared in the late 1970s. It is primarily designed for controlling display and printer devices, even though some of the controls warrant other uses as well. The C1 set is intended for use with the C0 set.

The C1 set includes "Format effectors" that control horizontal and vertical movement when displaying or printing. There are "Presentation controls" for defining line-break behavior. There are "Area definition" controls for form filling. There are "Introducers" and "Shift Functions" to support extra controls and characters. Additional controls exist for sending command strings and setting an indicator. Some of the controls were intended to cover for shortcomings in the C0 set. Some controls were reserved: 2 controls are for private use, while 4 controls were (and still are) reserved for future standardization.

The C1 set occupies positions 128–159 in 8-bit environments. There are also escape codes to use the C1 set on 7-bit systems. The respective escape codes (ESC char) are given in the C1 list further below.

History of C1

In 1979 ANSI released additional controls for use with ASCII (ANSI X3.64). This came to be known as the C1 set. A similar set was also released as ECMA-48. According to ANSI, the C1 controls were intended for input/output control of two-dimensional character-imaging devices, including interactive terminals of both the cathode ray tube and printer types, as well as output to microfilm printers.

A bit later, in 1983, the C1 set was standardized as ISO 6429. Standard-wise, the C1 set has been volatile. Both ISO 6429 and ECMA-48 were updated several times. New control characters were added and definitions updated. One of the C1 characters (IND) was eventually deprecated and removed.

The standards actually cover more control codes than those that fit in the C1 area. These additional controls are used via control sequences (escape sequences). The sequences are beyond the subject of this article. Let it suffice that the sequences are an important part of the standards that should be used together with the C1 controls. The sequences, together with C1, are also known as VT100 and ANSI escape sequences.

Current status of C1

The current standards for C1 are ISO/IEC 6429:1992 and ECMA-48:1991. These standards now define both the C0 and C1 control characters.

Unicode allows the use of C1 (and C0 too). In fact, the C1 area has been entirely reserved for control codes in Unicode. On the contrary, the (somewhat outdated) DOS and Windows codepages, i.e. character sets, have not reserved space for C1. Instead, they have included additional graphic characters in the C1 area. This doesn't prevent the use of C1 controls on DOS and Windows, though.

In practice, the C1 control characters are not very common. They are specialized codes for special applications.

ISO 8859 special characters NBSP and SHY

Positions 160 and 173.

ISO 8859 is a group of 8-bit extended character sets. The sets cover various Latin characters and also Cyrillic, Greek, Arabic, Hebrew and Thai characters. ISO 8859 is related to the Windows character sets ("ANSI codepages"), but these are actually different from each other.

Two characters in ISO 8859 are of interest to us: Non-Breaking Space (NBSP) and Soft Hyphen (SHY). They both have control character like properties, even though they are not actually called control characters in ISO 8859.

NBSP appears in position 160 (hex A0) and SHY is 173 (hex AD). The same positions, and roughly the same meanings too, have been adopted to many of the Windows codepages and Unicode.

Note: ISO 8859-8 Latin/Hebrew defines two additional special characters, namely LRM (left-to-right mark) and RLM (right-to-left mark). These characters are not universal in ISO 8859, but specific to Hebrew. Since LRM and RLM were not used in any other ISO 8859 character set, and since they do not appear in Unicode at the same positions, they are not further presented in this article.

Current status of NBSP and SHY

Several current standards include NBSP and SHY. They appear at the same positions in all of the following:

ISO 8859-1 to 8859-16.Exception: ISO 8859-11 Latin/Thai does not include SHY.

Windows codepages 1250–1258.

Unicode, block U+0080 C1 Controls and Latin Supplement.

Control characters in Unicode

Control characters have made their way to Unicode as well. Unicode recognizes control characters and explicitly allows their use. While Unicode doesn't obsolete control characters, it defines special rules for just a handful of them. Let the standard speak for itself:

The Unicode Standard provides for the intact interchange of these code points, neither
adding to nor subtracting from their semantics. The semantics of the control codes are generally
determined by the application with which they are used. However, in the absence of
specific application uses, they may be interpreted according to the control function semantics
specified in ISO/IEC 6429:1992. (Unicode 9.0 p. 822)

Unicode specifies semantics for the following control characters. The semantics appear to be in line with their original semantics, even though some differences may exist.

ASCII control characters:

HT and SP are considered whitespace.

LF, VT, FF and CR are considered whitespace, and also mandatory line breaks in the line breaking algorithm.

FS, GS, RS and US are considered separators in the bi-directional algorithm.

C1 control characters:

NEL is considered a mandatory line break in the line breaking algorithm, even though supporting it is optional.

ISO 8859:

NBSP and SHY. These characters are not actually control characters in Unicode. Instead, NBSP is "Separator, space" and SHY is "Other, format". Both characters have features in the line-breaking algorithm. With SHY, Unicode is significantly more elaborate than ISO 8859 in that Unicode suggests more hyphenation features than just displaying a hyphen.

Note: While no new control characters appear in Unicode, it does define some of its own special characters, such as formatting characters. These characters are beyond the scope of this article.

From ASCII via ISO to Unicode

The following diagram summarizes the development of character standards. You can see how the control characters were propagated from ASCII (X3.4) and other standards to Unicode.

Control characters in modern applications

With so many control characters coming from the 1960s and 1970s, are they still useful for application programmers?

It depends on the application. Generally speaking, one needs control characters to work with old interfaces or devices. New protocols and file formats tend to use some other mechanism than control characters. Current formats typically use textual markup such as XML, which has little use for control characters beyond whitespace. On the device control side, unless you are writing device drivers, you control devices through operating system calls or library routines rather than sending them control strings to do tricks.

The following is a subjective list of which characters are still in common use and which ones are used less. The list is based on experience writing application software for Windows and DOS.

ASCII control characters: some used, some not

NUL is still common in everyday use. NUL terminates a string in many programming languages and interfaces.

Transmission control characters (T1 to T10) are generally of little use. Data transfer is done through TCP/IP sockets, HTTP, FTP or some other protocol. Individual transmission control characters appear for special uses.

BEL probably no longer appears in its original use. Rather than sending BEL to produce beeps, applications will rather play a tune via other means.

Format effectors (F0 to F5) are possibly the most important control characters these days. Some of them, such as CR and LF, are essential for a system to work at all. HT is also very common, especially in plain text files. BS and FF are less common. VT appears only rarely if ever.

Device control characters (DC1 to DC4) are not required to control devices, really. To control a device from an application you rather make a system call. On the other hand, you might still need XOFF (^S) or XON (^Q) in a command line session from time to time.

SO, SI and ESC used to be common, but this has changed. One may find them from time to time, but supposedly it's about older systems then.

SUB might no longer appear as a substitute. You will more likely see something like "?" or the Unicode REPLACEMENT CHARACTER (U+FFFD) as a substitute for a bad character. Another use for SUB still exists, though. You could find it at the end of a text file.

Information separators (IS4 to IS1) are technically still valid. If anyone uses them to separate information is another question. Other techniques are used instead, such as XML or database systems. As a simple delimiter character a NUL, HT, CR/LF, comma or semicolon is more common than any of the information separators originally designed for the purpose.

Characters ^A to ^Z (1 to 26) frequently appear as keyboard shortcuts in various applications and operating systems. The actual feature triggered by a keyboard shortcut is often unrelated to the respective control character. More of that follows below.

C1 control characters: little use

NEL is the only C1 character recognized by Unicode. The most probable case to run into NEL is when EBCDIC compatibility is required.

The other C1 characters appear outdated now. Since VT100 (that uses C1 extensively) is still a current method with Unix shell sessions, C1 is alive, maybe even everyday business for you. From a programmer's point of view the entire C1 set is rarely used.

ISO 8859 special characters: in use

NBSP is an everyday character to suppress a line break. It is supported by several current standards, including HTML and Unicode.

Some frequently used characters, especially in a special field, may not have been mentioned. If you know frequent current uses for any of the characters, let us know.

Many of the control characters only appear rarely. How did this affect the space efficiency of 7-bit and 8-bit character sets? Instead of reserving space for control characters, it was possible to reuse these areas for additional graphics. This was actually done by DOS, Windows and Mac, all of which assigned graphic characters to the control character areas. Unicode chose to be different in this respect. Since its code space is much larger than 128 or 256, it was possible to reserve the C0/C1 areas entirely for control characters. This has helped the control characters to survive, if not in practical use, then at least in various code charts and lists.

Keyboards and control characters

Users can create many of the control characters from their keyboards. This usually happens in combination with the Ctrl key, and, more rarely, with the Esc key. There are also some special keys that produce control characters on their own. Backspace, Enter, Esc, Space and Tab are the usual ones.

Key presses and control characters, while having some things in common, are usually unrelated. Pressing a key combination doesn't generally trigger the functionality of the respective control character. As an example, while it's possible to press Ctrl+O to create an SO (Shift Out), pressing Ctrl+O seldom runs the operation associated with SO (pick an alternate character set). Instead, Ctrl+O might start an operation beginning with an O, such as "Open".

In some cases a key press does trigger the respective control character feature. Pressing the Tab key, or Ctrl+I, can indeed produce an HT (Horizontal Tabulation) and move the cursor forward on the line. This is an exception rather than the norm, though.

Some key combinations are more likely than others. Ctrl+A through Ctrl+Z (in other words, ^A to ^Z) are common keyboard shortcuts. Control key combinations with a symbol (^@, ^[, ^\, ^], ^^, ^_, ^?) are less common. There is a reason why such combinations should be avoided. Considerable variation exists with symbol keys in different keyboard layouts. A Ctrl and symbol key combination doesn't always produce the same control character, or any character at all, which makes it less useful as a keyboard shortcut.

In this article the focus is on the programmatic features of control characters. Less focus is put on the use of keyboard shortcuts.

About the character list

Next we are going to list every control character in detail. The column Dec refers to the decimal value of the control code ("ASCII value"). Hex is the same in hexadecimal, preceded by a dollar sign for clarity. An octal value is also given. The column Pos shows the row/column of the character in code charts.

The list shows key presses that (often) produce the control character on the keyboard. In addition, C-style escape sequences (\c) are provided where available, as are special constants supported by Visual Basic: classic version and Visual Basic .NET.

The last column lists mnemonics and graphic symbols. The symbols (in black) have been standardized, but they have fallen into disuse. The 2-letter mnemonics are standardized for the ASCII section. Additional 2-letter mnemonics for the C1 and ISO 8859 sections are taken from RFC 1345, which is not a standard, but is frequently referred to in this context.

Character list

ASCII control characters (C0)

The ASCII control characters work in 7-bit and 8-bit environments, as well as in Unicode. These controls originate from a set of related standards: ASCII, ISO 646 and ECMA-6, and also ISO 6429 and ECMA-48. All of these characters are available in Unicode, too. The actual C0 set consists of characters NUL through US (0–31). Two additional characters, SP and DEL, are a part of ASCII and the related standards as well.

*) The 2-character mnemonics for the ASCII set are from ANSI X3.32, ISO 2047 and ECMA-17. So are also the graphic symbols. The symbols are outdated and rarely used. A couple of the symbols also have alternative forms.

Dec

Hex

Char

Description

Octal

Pos

*)

0

$00

NUL

Null

000

0/0

NU

\0

^@

NUL is defined in the standards as a filler character. It can be used as media-fill or time-fill. NUL doesn't affect the information content of a data stream. It may affect the information layout and the control of equipment, though.

Note: NUL was originally intended as an ignorable filler character with no meaning. Especially convenient on paper tape, where a NUL equals no holes punched, it could be used to reserve space for new information or correcting errors. ASCII-1986 even suggests NUL as a "time-waster" character to be added after a newline to accommodate mechanical devices where a carriage return works slowly. Despite this, NUL has been used contrary to the standards in null-terminated strings as an End-Of-String marker. Several programming languages use this convention.

Constant in Visual Basic and VB.NET: vbNullChar, NullChar

1

$01

SOH

Start of Heading — TC1 Transmission control character 1

001

0/1

SH

^A

Indicates the beginning of a heading in a transmission. The heading can be terminated by STX. As per ASCII-1968, a heading constitutes a machine-sensible address or routing information. Later standards have dropped the explanation.

Note: SOH, along with STX and ETX, was intended for data transmission. It is not intended for marking a heading in a document.

2

$02

STX

Start of Text — TC2 Transmission control character 2

002

0/2

SX

^B

STX has two functions in a transmission: it 1) indicates the beginning of a text and 2) may terminate a heading (see SOH). As per ASCII-1968, text is what should be transmitted to a destination. Later standards have dropped the explanation.

3

$03

ETX

End of Text — TC3 Transmission control character 3

003

0/3

EX

^C

Terminates a text in a transmission. As per ASCII-1968, a text starts with STX and ends with ETX. Later standards don't necessarily require the pairing of STX with ETX.

Note: ETX may be used to call for reply from a slave station after a message has been sent. ETX is also commonly used to terminate an interactive process (Ctrl+C).

Ctrl+Break on PC keyboard produces this character code.

4

$04

EOT

End of Transmission — TC4 Transmission control character 4

004

0/4

ET

^D

Indicates the conclusion of a transmission. The transmission may have contained one or more texts and associated heading(s).

Note: EOT can be used to end or abort a transmission. It can also be a reply to indicate inability to receive further messages. EOT (Ctrl+D) is even used as an End-Of-File control in a Unix shell session.

5

$05

ENQ

Enquiry — TC5 Transmission control character 5

005

0/5

EQ

^E

Requests a response from a remote station. The response may include station identification or status. ENQ can be used as a "Who Are You" (WRU) to identify a remote station, especially after a new connection has been established.

6

$06

ACK

Acknowledge — TC6 Transmission control character 6

006

0/6

AK

^F

An affirmative response. Transmitted from a receiver as a response to the sender.

Note: ACK can indicate that a slave station has received a message correctly and is ready to receive more.

7

$07

BEL

Bell

007

0/7

BL

\a

^G

Calls for human attention. BEL may control alarm or attention devices.

Note: BEL is the only control character with an audible effect. It has been used to ring a bell (indeed) or produce a beep sound. A visual alarm is also possible.

In Unicode, this control character is abbreviated BEL but named ALERT, while the name BELL is confusingly used for a graphic character (🔔).

8

$08

BS

Backspace — FE0 Format effector 0

008

0/8

BS

\b

^H

Moves one character position backwards (keeping the previous character).

Note: Contrary to the standards, BS has been used as a combined "move back and delete" operation to remove the previous character. This is not the standard meaning of BS, however. BS is defined as a non-destructive "move back" or "move left" operation, similar to a backspace in mechanical typewriters. To delete the previous character, BS should be followed by DEL. On paper tape the result would be the previous character being completely punched out (erased). BS followed by another character would strike two characters in the same position. Overstriking was a way to produce combined characters. This option was intended to internationalize ASCII. A letter followed by BS followed by a diacritic symbol would produce an accented letter. As an example, u BS ^ would produce û. Several ASCII characters (" ' ` ^ ~ ,) were indeed defined to be used as diacritic symbols. Overstriking could also be suitable with other characters, such as for underlining with the "_" character or printing a slash "/" over "=" to produce "not equal". It could even be used to achieve a strike-through effect (perhaps with -, / or X) to indicate removed text. A boldface effect could be achieved by striking the same character several times at the same position.

Overstriking was a useful option with printing devices, but displays hardly support it. With the advent of more capable character sets and formatting techniques overstriking can be considered outdated. ASCII-1986 does not require overstriking capabilities and suggests that overstriking may be proscribed in the future. ISO 8859 explicitly forbids overstriking.

Advances to the next pre-determined character position (horizontal tab stop). HT could also be used as a skip function on punched cards.

Note: HT is commonly also abbreviated TAB.

Even though the standards don't set a universal tab width, a typical fixed tab width is 8 columns. Other tab widths, as well as custom tab positions, are used as well. HT is a simple method of data compression: a single character can represent several spaces in formatted text.

The TAB key on the keyboard is consistent with HT in that it usually produces the code HT. How the HT is treated in each application is another story. In windowing environments, there are three common alternative uses. Pressing TAB can either add an HT character into text, indent text (possibly by adding an appropriate number of spaces or shifting the marginal), or something completely different: jump to the next field or control in a graphical user interface. This way the TAB key has been extended to cover more uses than what HT was originally intended for.

The original name of HT is Horizontal Tabulation. It was later renamed as HT Character Tabulation, first in ECMA-48:1986.

↹ Tab on PC keyboard produces this character code.

Constant in Visual Basic and VB.NET: vbTab, Tab

10

$0A

LF

Line Feed — FE2 Format effector 2

012

0/10

LF

^J

LF has two alternative functions. It advances to the same character position on the next line (move down), or optionally to the first position on the next line (move to start of next line, i.e. newline). Originally LF was a move-down. A newline option (NL) was added soon. The option allowed LF to be used as a newline, which works like a combined CR LF. Use of LF as a newline requires agreement between sender and recipient of data. Universal agreement has not been reached.

Note: LF, having two alternative functions, has been a major source of confusion. While LF was initially defined as a "move down" operator, standards began to allow LF as a newline too. As a result, operating systems differ in their definition of a newline. A newline is LF on Unix. Operating systems using CR LF include CP/M, DOS, OS/2 and Windows. Naturally, this caused an incompatibility. To solve the problem, control characters IND and NEL were added to the C1 area. This did not solve the issue, resulting in IND being removed later. ECMA-6:1985 and ASCII-1986 attempted to clarify the situation by declaring LF deprecated for a newline and recommending CR LF instead. ECMA-48:1991 no longer allows LF to function as a newline.

The escape sequence for newline and LF is another source of confusion. \n is the common sequence for a newline, whereas there is no such a sequence for a line feed. The actual control character(s) represented by \n depend on the system. In some cases, \n indeed represents LF, but it can also represent another newline sequence.

Advances to the same character position on the next pre-determined line. ASCII-1977 and ASCII-1986 optionally allow VT to advance to the first position on the next pre-determined line, if agreed on.

Note: The original name of VT is Vertical Tabulation. It was later renamed as VT Line Tabulation, first in ECMA-48:1986. VT has been used to jump down to the next pre-defined line when printing on a paper form. According to some sources, vertical tab stops were typically spaced 6 lines apart. VT is a simple data compression method where a single VT represents several LF characters (and optionally a CR too).

In modern use VT must be quite a rare character. As Bob Bemer, one of the original designers of ASCII, put it: "This is a very dangerous character to use. It cannot be used directly on any terminal that I know of. Even if it could, the implementation rules are not supplied unambiguously in the ASCII standard."

Constant in Visual Basic and VB.NET: vbVerticalTab, VerticalTab

12

$0C

FF

Form Feed — FE4 Format effector 4

014

0/12

FF

\f

^L

Advances to the next form or page. Standards differ in what column the subsequent character position will be in. Originally, ASCII-1968 did not define the column at all. ISO and ECMA standards declare that FF does not change the column. ASCII-1977 and ASCII-1986 optionally allow, by agreement, moving to the first column, as if FF was actually CF FF.

Note: FF has been used as "page break" in text files, "new page" on printers and "clear the screen" on displays. The situation was originally unclear whether FF was just a "new page" operator or "new page, move to column 1". ASCII-1977 and ECMA-6:1985 attempted to clarify the situation by recommending the use of CR FF. ASCII-1986 even implied that the "new page, move to column 1" option might be deleted in a future edition of ASCII.

Constant in Visual Basic and VB.NET: vbFormFeed, FormFeed

13

$0D

CR

Carriage Return — FE5 Format effector 5

015

0/13

CR

\r

^M

Traditional definition: Moves to the first position on the same line (ASCII, ISO 646, ECMA-6). Newer definition: Moves to the line home position or line limit position of the same line (ISO 6429, ECMA-48).

Note: The standard meaning of CR is "move to beginning of current line". This allows overprinting the line with new characters, which could be used to achieve underlining, for example. For advancing to the next line CR would be followed by LF. On CP/M, DOS, OS/2 and Windows the newline marker is CR LF, which is according to the definition. CR alone has been used as the newline character on some systems, such as Commodore and Apple, which use does not conform to the standards in question. The order CR LF (instead of LF CR) may have been important on mechanical devices where a carriage return took relatively long to execute. A non-printing LF was more suitable output while the printing head was returning, rather than striking a graphic symbol in the middle of the line.

Used to extend the character set. SO may alter the meaning of the following bit combinations until an SI is reached. Between SI and SO, character positions 33-126 (decimal) may represent additional characters that would not otherwise fit in the regular character set.

Note: SO (Shift Out) is normal name of this control. LS1 (Locking-Shift One) is used by ECMA-35 and ECMA-48. In those standards, SO is used in 7-bit environments and LS1 in 8-bit environments. The mechanism to select the alternative character set(s) was defined in ANSI X3.41, ISO 2022 and ECMA-35. It includes the use of escape sequences starting with ESC. SO has also been used on printers to select enlarged characters or another color.

15

$0F

SI

Shift In — LS0 Locking-Shift Zero

017

0/15

SI

^O

Used in conjunction with SO. It may reinstate the standard meanings of the characters following it.

Note: SI (Shift In) is normal name of this control. LS0 (Locking-Shift Zero) is used by ECMA-35 and ECMA-48. In those standards, SI is used in 7-bit environments and LS0 in 8-bit environments. SI has also been used on printers to select condensed characters or to reset color.

16

$10

DLE

Data Link Escape — TC7 Transmission control character 7

020

1/0

DL

^P

Used to provide supplementary data transmission control functions. DLE changes the meaning of a limited number of following characters.

Note: DLE is the "escape" character for transmission control. DLE can potentially be put in the front of a transmission control character (TC1-TC10) to pass it through "as is" instead of controlling the current transmission. This is not always the case, though. It is possible to create new transmission control sequences with DLE in a similar way ESC is used to create escape sequences for other purposes. Contrary to the standards, Ctrl+P has been used as a keyboard command to echo console activity at the printer.

17

$11

DC1

Device Control 1 — XON

021

1/1

D1

^Q

Intended to turn on or start an ancillary device, to restore it to the basic operation mode (see DC2 and DC3), or for any other device control function.

Note: DC1 is conventionally called XON when used in communication for software flow control. The meaning of XON is to continue data transmission after an XOFF (DC3) has been received. The name XON ("transmit on") does not come from a standard, but it is commonly used.

18

$12

DC2

Device Control 2

022

1/2

D2

^R

Intended for turning on or starting an ancillary device, set it to a special mode (restored via DC1), or for any other device control function.

19

$13

DC3

Device Control 3 — XOFF

023

1/3

D3

^S

Intended for turning off or stopping an ancillary device. It may be a secondary level stop such as wait, pause, stand-by or halt (restored via DC1). It can also perform any other device control function.

Note: DC3 is conventionally called XOFF when used in communication for software flow control. An XOFF is issued to stop transmission when a device cannot accept more data. Transmission can be continued via XON (DC1). The name XOFF ("transmit off") does not come from a standard, but it is commonly used. The use of XOFF and XON is in line with the standards, even though not directly defined in them.

XOFF (^S) is sometimes used as a pause command. Continuing requires pressing XON (^Q). ^S even works as a pause on MS-DOS (pressing any key continues).

20

$14

DC4

Device Control 4 (Stop)

024

1/4

D4

^T

Intended to turn off, stop or interrupt an ancillary device, or for any other device control function.

21

$15

NAK

Negative Acknowledge — TC8 Transmission control character 8

025

1/5

NK

^U

Negative response. Transmitted from a receiver as a response to the sender.

Note: NAK can be sent as a response to indicate inability to receive a message, or to request resending.

22

$16

SYN

Synchronous Idle — TC9 Transmission control character 9

026

1/6

SY

^V

Used as "time-fill" in synchronous transmission. Sent during an idle condition to retain a signal when there are no other characters to send.

Note: SYN has been used by synchronous modems, which have to send data constantly. — Beginning each transmission with at least two SYN characters is a way to achieve synchronization. The receiving station will possibly ignore SYN, since it doesn't belong to the actual data content.

23

$17

ETB

End of Transmission Block — TC10 Transmission control character 10

027

1/7

EB

^W

Indicates the end of a block of data. Used when data is divided into blocks for transmission.

Note: ETB, when used to end a block, may call for a reply from a slave station.

24

$18

CAN

Cancel

030

1/8

CN

^X

Indicates that data is in error or should be disregarded. Affects "the data with which it is sent" (ASCII-1968, ASCII-1977) or "the data preceding it" (ASCII-1986, ISO 646, ECMA-6, ECMA-48).

Note: There are 2 alternative definitions for the data to be disregarded. The actual scope of cancellation is undefined by the standards and should be defined case by case. Ctrl+X has been used as a keyboard shortcut to cancel (delete) the characters on the current line, which use conforms to the standards.

25

$19

EM

End of Medium

031

1/9

EM

^Y

Identifies 1) the physical end of a medium, 2) the end of the used portion of a medium, or 3) the end of wanted data on a medium.

Note: EM may have been suitable for paper tape or magnetic tape to say "no more data". Disk file systems use more sophisticated ways to keep track of the used and unused areas of the medium.

This character is commonly abbreviated EM, except for Unicode, which provides it as an alias with abbreviation EOM.

26

$1A

SUB

Substitute

032

1/10

SB

^Z

Used in place of an invalid or erroneous character. Introduced by automatic means in cases like a transmission error.

Note: When SUB is used as a substitution character, the reverse question mark symbol seems quite good as its visual representation. Compare SUB to Unicode U+FFFD REPLACEMENT CHARACTER.

SUB has often been used contrary to the standards. On CP/M and MS-DOS, it appears as an End-Of-File marker for text files (^Z). On Unix, Ctrl+Z is a keyboard signal to interrupt a foreground process.

27

$1B

ESC

Escape

033

1/11

EC

\e

^[

The first character of an escape sequence. Provides either supplementary characters or additional control functions. ESC changes the meaning of a limited number of following characters.

Note: ESC is used to form escape sequences, which perform various control functions or apply additional character sets. ESC can also be used to invoke the C1 control characters on a 7-bit system that only support character positions 0–127.

On the keyboard, sometimes the Esc key indeed produces the ESC control character. In windowing environments, the Esc key typically cancels a dialog or an operation, rather than producing a control character or starting an escape sequence. This kind of an "escape" is not based on the character standards, however. The closest ASCII equivalent for canceling a dialog would be CAN, but since there is no "Can" key on the common keyboards, it can't be used.

Esc on PC keyboard produces this character code.

28

$1C

FS

File Separator — IS4 Information separator 4

034

1/12

FS

^\

The four information separators (FS, GS, RS and US) are used to separate and qualify data. Each separator has two alternative names: Information Separator Four equals File Separator, Information Separator Three equals Group Separator, Information Separator Two equals Record Separator and Information Separator One equals Unit Separator. The separators can be used either hierarchically or in a non-hierarchical manner. When used hierarchically, the order is US (least inclusive), RS, GS and FS (most inclusive). The content and length of a file, group, record or unit are not specified by the standards.

FS, when used in a hierarchical order, delimits a data item called a file. It can also delimit anything else.

29

$1D

GS

Group Separator — IS3 Information separator 3

035

1/13

GS

^]

GS, when used in a hierarchical order, delimits a data item called a group. It can also delimit anything else.

30

$1E

RS

Record Separator — IS2 Information separator 2

036

1/14

RS

^^

RS, when used in a hierarchical order, delimits a data item called a record. It can also delimit anything else.

31

$1F

US

Unit Separator — IS1 Information separator 1

037

1/15

US

^_

US, when used in a hierarchical order, delimits a data item called a unit. It can also delimit anything else.

Note: The information separators were deliberately arranged next to SPACE, which can also be used as an information separator (word separator).

32

$20

SP

Space

040

2/0

SP

Moves one character position forwards. Space may also have a function equivalent to that of an information separator.

Note: Space has a dual nature. It can be classified as both a control character and a (non-printing) graphic character. SP is similar to a Format Effector. It can also be used as a fifth Information Separator. Space is sometimes represented by the symbol ƀ or ␢ (b with a stroke) or ␣ (open box). SP does not belong to the C0 set.

Outdated. An ignorable character originally intended for erasing an erroneous or unwanted character in punched tape. In this standard use, DEL wouldn't affect the information content of data, even though it may have affected the information layout and the control of equipment. Standards also allowed DEL to be used as media-fill or time-fill (even though a NUL may be more appropriate).

Note: DEL is now outdated. It was removed from the latest standards (ECMA-48 in 1991 and ISO 6429 in 1992). The origin of DEL is with perforated paper. On that, DEL was equal to "all holes punched", which is a way to invalidate an erroneous character (rubout). In a sense, DEL is similar to NUL, since both characters mean "nothing". ASCII-1977 suggests the use of DEL as a "time waster" to accommodate mechanical devices where a carriage return takes time to execute. ASCII-1986 recommends NUL as a time waster instead of DEL. DEL does not belong to the C0 set, but is an individual control code.

\x is what you write in a C program to produce the given control character. ^X means you press Ctrl+X to produce the given control character.

C1 control characters

The C1 control characters work in 8-bit environments. These controls come from 3 related standards: ANSI X3.64, ISO 6429 and ECMA-48. All of these characters are also available in Unicode, too. There are three unassigned control characters: PAD, HOP and SGCI. Use was planned for them in a failed draft DIS 10646, but they were not actually standardized or put to use. Despite this, one can find these control characters in various C1 lists online, and also as aliases in later Unicode standards.

†) The 2-character mnemonics for C1 are from RFC 1345. They are not standardized.

Dec

Hex

Char

Description

Octal

Pos

†)

128

$80

PAD

unassigned, "Padding Character"

200

8/0

PA

ESC @

A reserved control code. Intended for use as PAD Padding Character in draft DIS 10646, rejected, never standardized (not accepted to ISO 10646).

Note: Not part of ISO/IEC 6429 or ECMA-48.

Unicode lists this character as XXX and provides PAD as an alias.

129

$81

HOP

unassigned, "High Octet Preset"

201

8/1

HO

ESC A

A reserved control code. Intended for use as HOP High Octet Preset in draft DIS 10646, rejected, never standardized (not accepted to ISO 10646).

Note: Not part of ISO/IEC 6429 or ECMA-48. Listed as XXX in Unicode.

Unicode lists this character as XXX and provides HOP as an alias.

130

$82

BPH

Break Permitted Here

202

8/2

BH

ESC B

A point where a line break may occur.

Note: Roughly equivalent to a soft hyphen except that the means for indicating a line break is not necessarily a hyphen. Compare to Unicode U+200B ZERO WIDTH SPACE.

131

$83

NBH

No Break Here

203

8/3

NH

ESC C

A point where a line break may not occur.

Note: Compare to Unicode U+2060 WORD JOINER.

132

$84

IND

Index

204

8/4

IN

ESC D

Moves to the next line keeping the current horizontal position.

Note: According to ECMA-48:1986, IND was provided for use in those cases where LF was implemented as New Line. IND was deprecated in 1988 and withdrawn in 1992 from ISO/IEC 6429 (1986 and 1991 respectively for ECMA-48).

Starts a string of character positions whose contents can be transmitted. The string ends at EPA (or end of display).

135

$87

ESA

End of Selected Area

207

8/7

ES

ESC G

Ends a string of character positions (started by SPA) whose contents can be transmitted.

136

$88

HTS

Horizontal Tabulation Set, Character Tabulation Set

210

8/8

HS

ESC H

Sets a tab stop at the active position.

Note: ISO 6429:1992, ECMA-48:1986 and ECMA-48:1991 have renamed HTS as Character Tabulation Set.

137

$89

HTJ

Horizontal Tabulation with Justification, Character Tabulation with Justification

211

8/9

HJ

ESC I

Moves text to the following tab stop. The text is what comes after the previous tab stop up to the active position.

Note: This character has several names. ANSI X3.64 originally called it Horizontal Tabulation with Justify. ISO 6429:1992, ECMA-48:1986 and ECMA-48:1991 have renamed HTJ as Character Tabulation with Justification.

138

$8A

VTS

Vertical Tabulation Set, Line Tabulation Set

212

8/10

VS

ESC J

Sets a vertical tab stop at the active line.

Note: ISO 6429:1992, ECMA-48:1986 and ECMA-48:1991 have renamed VTS as Line Tabulation Set.

139

$8B

PLD

Partial Line Down, Partial Line Forward

213

8/11

PD

ESC K

Moves down so that following characters will appear as subscripts. Subscripts end at the next PLU.

Used to extend the character set. The next character will be from the currently chosen G2 set.

Note: For more information see ISO 2022 or ECMA-35. The next character should be in the decimal range 33-126 or 32-127.

143

$8F

SS3

Single Shift Three

217

8/15

S3

ESC O

Used to extend the character set. The next character will be from the currently chosen G3 set.

Note: For more information see ISO 2022 or ECMA-35. The next character should be in the decimal range 33-126 or 32-127.

144

$90

DCS

Device Control String

220

9/0

DC

ESC P

Starts a device control string. ST ends the string. The control string may include commands to the receiving device, or a status report from the sending device.

145

$91

PU1

Private Use One

221

9/1

P1

ESC Q

Reserved for private use, no standardized meaning.

146

$92

PU2

Private Use Two

222

9/2

P2

ESC R

Reserved for private use, no standardized meaning.

147

$93

STS

Set Transmit State

223

9/3

TS

ESC S

Notifies that data is ready for transfer from a device (ANSI X3.64), or establishes the transmit state in the receiving device (ISO 6429, ECMA-48). Doesn't initiate the actual transmission.

148

$94

CCH

Cancel Character

224

9/4

CC

ESC T

Ignore the preceding graphic character (and CCH itself too). If the previous character is a control character or sequence, ANSI X3.64 says it should be ignored, while ISO 6429 and ECMA-48 leave the action undefined.

Note: Destructive backspace. Intended to eliminate ambiguity about the meaning of BS.

Start of Guarded Protected Area, Start of Protected Area, Start of Guarded Area

226

9/6

SG

ESC V

Starts a string of character positions that can't be altered manually or transmitted. Optionally protects against erasure too. EPA will end the string.

Note: SPA is known as Start of Protected Area (ANSI X3.64, ECMA-48:1979), Start of Guarded Protected Area (ISO 6429:1983, ECMA-48:1984) and Start of Guarded Area (ISO 6429:1992, ECMA-48:1986 and ECMA-48:1991).

151

$97

EPA

End of Guarded Protected Area, End of Protected Area, End of Guarded Area

227

9/7

EG

ESC W

Ends the area started by SPA.

Note: EPA is known as End of Protected Area (ANSI X3.64, ECMA-48:1979), End of Guarded Protected Area (ISO 6429:1983, ECMA-48:1984) and End of Guarded Area (ISO 6429:1992, ECMA-48:1986 and ECMA-48:1991).

152

$98

SOS

Start of String

230

9/8

SS

ESC X

Starts a control string. The string ends at ST. It cannot contain a SOS. The interpretation of the string depends on the application.

153

$99

SGCI

unassigned, "Single Graphic Character Introducer"

231

9/9

GC

ESC Y

A reserved control code. Intended for use as SGCI Single Graphic Character Introducer in draft DIS 10646, rejected, never standardized (not accepted to ISO 10646).

Note: Not part of ISO/IEC 6429 or ECMA-48. Listed as XXX in Unicode.

Unicode lists this character as XXX and provides SGC as an alias.

154

$9A

SCI

Single Character Introducer

232

9/10

SC

ESC Z

A reserved control code. The name was standardized as SCI Single Character Introducer, but the actual functionality was not implemented in the standards.

Note: SCI was to be followed by a single byte, which would represent a control function or a graphic character. The functions or characters were not defined in the standards.

155

$9B

CSI

Control Sequence Introducer

233

9/11

CI

ESC [

Starts a control sequence.

156

$9C

ST

String Terminator

234

9/12

ST

ESC \

Closes a string opened by APC, DCS, OSC, PM or SOS.

157

$9D

OSC

Operating System Command

235

9/13

OC

ESC ]

Starts an operating system control string. The string ends at ST and is interpreted subject to the operating system.

158

$9E

PM

Privacy Message

236

9/14

PM

ESC ^

Starts a privacy message. ST will end the message.

159

$9F

APC

Application Program Command

237

9/15

AC

ESC _

Starts an application program command string. ST will end the command. The interpretation of the command is subject to the program in question.

ESC X means you press Esc followed by X to produce this control character.

ISO 8859 special characters

The two special characters, NBSP and SHY, are not really control characters. They are graphic characters with a special feature. The characters also appear in Unicode. They are included here for the sake of completeness.

‡) The 2-character mnemonics for NBSP and SHY are from RFC 1345. They are not standardized.

Dec

Hex

Char

Description

Octal

Pos

‡)

160

$A0

NBSP

No-Break Space

240

10/0

NS

A space for use when a line break is to be prevented.

Note: NBSP can sometimes be produced by pressing Ctrl+Shift+SPACE. No universally supported key combination exists.

In HTML you can write &nbsp; or &#160; to add a no-break space to a web page.

Format effectors are mainly intended for the control of the layout and positioning of information.
Format effectors (most of them) are data which happen to have a format representation rather than a graphic representation.

Device control characters are intended for the control of local or remote or ancillary devices. They are not intended to control data communication systems; this should be done with transmission control characters.

Unassigned control characters are ones that were not standardized. Their location was reserved for future standardization. These characters are known by names that appeared in a draft (DIS 10646), even though they didn't make it to the final standard.