Legacy applications sometimes create data incorporating control codes. It can therefore sometimes be important to understand how controls
are supported in markup languages, when migrating these applications or their data to the web.

There are two ranges of the Unicode Character Set that are assigned as control codes. The Unicode Standard makes no particular use of
these controls and leaves their definition up to the application. If the application does not specify their use, then they are to be interpreted
according to the semantics of ISO/IEC 6429. Most of you will recognize many of the 6429 controls: ACK, NAK, BEL, LF, FF, VT, CR, et al. The ISO 8859
family and other character standards base their control codes on the ISO 6429 standard.

The control codes in the range U+0000-U+001F are known as the "C0" range. This range begins with
the NUL (Null) U+0000 control. The control codes in the range U+0080-U+009F are known as the "C1" range. DEL (Delete) U+007F is also a control and is adjacent to the beginning of the C1 Range.

Control codes should be replaced with appropriate markup. Since XML provides a standard way of encoding
structured data, representing control codes other than as markup would undo the actual advantages of using XML. Use of control codes in HTML and
XHTML is never appropriate, since these markup languages are for representing text, not data. The only time the following information should be
needed is in the rare case where legacy data containing control codes cannot be cleaned up.

If the data is not really textual, but binary, then it may be more practical to encode it, for example using base64 or as hexadecimal
values, to ensure only supported characters are used in the markup language text. (And of course, decoding the text when reading the files.) Note
that XML Schema provides data types for these encodings.

Another alternative is to store the data in an external document and reference it from the XML document.

In XML 1.1, if you need to represent a control code explicitly the simplest
alternative is to use an NCR (numeric character reference). For example, the control code ESC (Escape) U+001B would be represented
by either the &#x1B; (hexadecimal) or &#27; (decimal) Numeric Character References.

The following table summarizes which markup languages support the control codes:

Controls

Range

HTML 4

XHTML 1.0

XML 1.0

XML 1.1

C0, except HT, LF, CR

U+0000 (NUL)

Illegal

Illegal

Illegal

Illegal

U+0001-U+001F

Illegal

Illegal

Illegal

NCR

HT, LF, CR

U+0009, U+000A, U+000D

Supported

Supported

Supported

Supported

DEL + C1

U+007F-U+009F

Illegal

Illegal

Supported

NCR

NEL

U+0085

Illegal

Illegal

(allowed)

Supported

The NUL (Null) control is illegal and cannot be represented by NCR or encoded directly in markup languages.

HTML, XHTML and XML 1.0 do not support the C0 range, except for HT (Horizontal Tabulation) U+0009, LF (Line
Feed) U+000A, and CR (Carriage Return) U+000D. The C1 range is supported, i.e. you can encode the controls directly
or represent them as NCRs (Numeric Character References).

XML 1.1 restricts the C1 range, except for NEL U+0085 (the EBCDIC New line), as well as the C0 range. However,
XML 1.1 allows the controls to be represented by NCRs (Numeric Character References).

Whereas the ISO 8859 family reserves the C1 range for controls, Microsoft character sets (e.g. 1250-1258) place characters in this range.
Sometimes content authors mistakenly use the Microsoft character code points in creating NCRs instead of using the Unicode values. Because of the
prevalence of this mistake, many browsers display the Microsoft characters in this range. This is incorrect behavior and further misleads the
developer by incorrectly confirming the mistaken value. The problem may eventually be discovered when the data is treated by some application, or when a standards-conforming browser fails to display the intended character.