5.7.2 Unicode Representations

The procedures in this section implement transformations that convert
between the internal representation of Unicode characters and several
standard external representations. These external representations are
all implemented as sequences of bytes, but they differ in their intended
usage.

UTF-8

Each character is written as a sequence of one to four bytes.

UTF-16

Each character is written as a sequence of one or two 16-bit integers.

UTF-32

Each character is written as a single 32-bit integer.

The UTF-16 and UTF-32 representations may be
serialized to and from a byte stream in either big-endian or
little-endian order. In big-endian order, the most significant
byte is first, the next most significant byte is second, etc. In
little-endian order, the least significant byte is first, etc. All of
the UTF-16 and UTF-32 representation procedures are
available in both orders, which are indicated by names containing
`utfNN-be' and `utfNN-le', respectively. There are also
procedures that implement host-endian order, which is either
big-endian or little-endian depending on the underlying computer
architecture.

Each of these procedures converts a byte vector to a wide string,
treating string as a stream of bytes encoded in the corresponding
`utfNN' representation. The arguments start and end
allow specification of a substring; they default to zero and
string's length, respectively.

Each of these procedures counts the number of Unicode characters in a
byte vector, treating string as a stream of bytes encoded in the
corresponding `utfNN' representation. The arguments start
and end allow specification of a substring; they default to zero
and string's length, respectively.

Each of these procedures converts a wide string to a stream of bytes
encoded in the corresponding `utfNN' representation, and returns
that stream as a byte vector. The arguments start
and end allow specification of a substring; they default to zero
and string's length, respectively.