When a process purports not to modify the interpretation of a valid coded character
sequence, it shall make no change to that coded character sequence other than the possible
replacement of character sequences by their canonical-equivalent sequences or the
deletion of noncharacter code points.

(中略)

If a noncharacter that does not have a specific internal use is unexpectedly
encountered in processing, an implementation may signal an error or delete or
ignore the noncharacter. If these options are not taken, the noncharacter
should be treated as an unassigned code point. For example, an API that
returned a character property value for a noncharacter would return the same
value as the default value for an unassigned code point.

A coded character sequence is also known as a coded character representation.

Normally a coded character sequence consists of a sequence of encoded characters,
but it may also include noncharacters or reserved code points.

Internally, a process may choose to make use of noncharacter code points in its
coded character sequences. However, such noncharacter code points may not
be interpreted as abstract characters (see conformance clause C2), and their
removal by a conformant process does not constitute modification of interpretation
of the coded character sequence (see conformance clause C7).

A code point that is permanently reserved for internal use and that
should never be interchanged. Noncharacters consist of the values U+nFFFE and
U+nFFFF (where n is from 0 to 1016) and the values U+FDD0..U+FDEF.

For more information, see Section 16.7, Noncharacters.

These code points are permanently reserved as noncharacters.

D15 Reserved code point

Any code point of the Unicode Standard that is reserved for
future assignment. Also known as an unassigned code point.

Surrogate code points and noncharacters are considered assigned code points,
but not assigned characters.

Noncharacters are code points that are permanently reserved in the Unicode Standard for
internal use. They are forbidden for use in open interchange of Unicode text data. See
Section 3.4, Characters and Encoding, for the formal definition of noncharacters and conformance
requirements related to their use.

The Unicode Standard sets aside 66 noncharacter code points. The last two code points of
each plane are noncharacters: U+FFFE and U+FFFF on the BMP, U+1FFFE and U+1FFFF
on Plane 1, and so on, up to U+10FFFE and U+10FFFF on Plane 16, for a total of 34 code
points. In addition, there is a contiguous range of another 32 noncharacter code points in
the BMP: U+FDD0..U+FDEF. For historical reasons, the range U+FDD0..U+FDEF is contained
within the Arabic Presentation Forms-A block, but those noncharacters are not
“Arabic noncharacters” or “right-to-left noncharacters,” and are not distinguished in any
other way from the other noncharacters, except in their code point values.

Applications are free to use any of these noncharacter code points internally but should
never attempt to exchange them. If a noncharacter is received in open interchange, an
application is not required to interpret it in any way. It is good practice, however, to recognize
it as a noncharacter and to take appropriate action, such as removing it from the text.
Note that Unicode conformance freely allows the removal of these characters. (See conformance
clause C7 in Section 3.2, Conformance Requirements.)

In effect, noncharacters can be thought of as application-internal private-use code points.
Unlike the private-use characters discussed in Section 16.5, Private-Use Characters, which
are assigned characters and which are intended for use in open interchange, subject to
interpretation by private agreement, noncharacters are permanently reserved (unassigned)
and have no interpretation whatsoever outside of their possible application-internal private
uses.

U+FFFE

U+FFFE. This noncharacter has the intended peculiarity that, when represented in UTF-16
and then serialized, it has the opposite byte sequence of U+FEFF, the byte order mark. This
means that applications should reserve U+FFFE as an internal signal that a UTF-16 text
stream is in a reversed byte format. Detection of U+FFFE at the start of an input stream
should be taken as a strong indication that the input stream should be byte-swapped before
interpretation. For more on the use of the byte order mark and its interaction with the noncharacter
U+FFFE, see Section 16.8, Specials.

U+FFFF

U+FFFF and U+10FFFF. These two noncharacter code points have the attribute of being
associated with the largest code unit values for particular Unicode encoding forms. In
UTF-16, U+FFFF is associated with the largest 16-bit code unit value, FFFF16. U+10FFFF is
associated with the largest legal UTF-32 32-bit code unit value, 10FFFF16. This attribute
renders these two noncharacter code points useful for internal purposes as sentinels. For
example, they might be used to indicate the end of a list, to represent a value in an index
guaranteed to be higher than any valid character value, and so on.