Documentation

The encode function translates elements of the buffer from
to the buffer to. It should translate as many elements as possible
given the sizes of the buffers, including translating zero elements
if there is either not enough room in to, or from does not
contain a complete multibyte sequence.

The fact that as many elements as possible are translated is used by the IO
library in order to report translation errors at the point they
actually occur, rather than when the buffer is translated.

To allow us to use iconv as a BufferCode efficiently, character buffers are
defined to contain lone surrogates instead of those private use characters that
are used for roundtripping. Thus, Chars poked and peeked from a character buffer
must undergo surrogatifyRoundtripCharacter and desurrogatifyRoundtripCharacter
respectively.

For more information on this, see Note [Roundtripping] in GHC.IO.Encoding.Failure.

The recover function is used to continue decoding
in the presence of invalid or unrepresentable sequences. This includes
both those detected by encode returning InvalidSequence and those
that occur because the input byte sequence appears to be truncated.

Progress will usually be made by skipping the first element of the from
buffer. This function should only be called if you are certain that you
wish to do this skipping, and if the to buffer has at least one element
of free space.

recover may raise an exception rather than skipping anything.

Currently, some implementations of recover may mutate the input buffer.
In particular, this feature is used to implement transliteration.

Many codecs are not stateful, and in these case the state can be
represented as '()'. Other codecs maintain a state. For
example, UTF-16 recognises a BOM (byte-order-mark) character at
the beginning of the input, and remembers thereafter whether to
use big-endian or little-endian mode. In this case, the state
of the codec would include two pieces of information: whether we
are at the beginning of the stream (the BOM only occurs at the
beginning), and if not, whether to use the big or little-endian
encoding.

The Latin1 (ISO8859-1) encoding. This encoding maps bytes
directly to the first 256 Unicode code points, and is thus not a
complete Unicode encoding. An attempt to write a character greater than
'\255' to a Handle using the latin1 encoding will result in an error.

The UTF-8 Unicode encoding, with a byte-order-mark (BOM; the byte
sequence 0xEF 0xBB 0xBF). This encoding behaves like utf8,
except that on input, the BOM sequence is ignored at the beginning
of the stream, and on output, the BOM sequence is prepended.

The byte-order-mark is strictly unnecessary in UTF-8, but is
sometimes used to identify the encoding of a file.