Good ole' ASCII

The common denominator of all modern character sets like the ones from
ISO 8859 is the very successful American
Standard Code for Information Interchange (ASCII, /'æski:/)
which was published thirty years ago in 1968 as ANSI X3.4 and looks like
this:

Nowadays most computers use encodings based on ASCII. The only
notable exception are the IBM
mainframes that never departed from the myriad of dialects of their
Extended Binary Coded Decimal Information Code (EBCDIC) that
predated ASCII and gave a more readable BCD coding on paper
punch-cards:

Don't let your computer manufacturer tell you that there are ASCII
characters beyond those shown above. ASCII uses only 7 bits and
allows the most significant eighth bit to be used as parity
bit, highlight bit, end-of-string bit (all of which are considered bad
practice nowadays) or to include additional characters for internationalization (i18n for which we need
8bit-clean programs that do none of afore-mentioned silly tricks) but
ASCII defined no standard for this and
many manufacturers invented their own proprietary codepages.

The only languages that can comfortably be written with the
repertoire of US-ASCII happen to be Latin, Swahili, Hawaiian and
American English without most typographic frills. It is rumoured that
there are more languages in the world.

That's why foreign standardizers crafted up national
variants of ASCII like the German DIN 66003 which my first CP/M
computer (a Sharp MZ-731) used in 1984 to communicate with its daisy wheel
printer or the Danish DS 2089 mentioned by Bjarne Stroustrup in §
6.5.3 of his C++
history. These 7bit charsets used the code positions of the less
needed ASCII characters

# $ @ [ \ ] ^ ` { | } ~

to insert their accented letters so that

,,schöne Grüße``

would get encoded as

,,sch|ne Gr}~e``

and C programs like

{*argv[1]='\0'}

showed up as

æ*argvÆ1Å='Ø0'å

because C didn't offer alternative digraphs like Pascal's (. for [ and
(* for {.

ASCII and its national variants were declared international
standard ISO 646 in 1972. Back then, the socialist countries
managed to substitute the international currency sign ¤ for ASCII's
capitalist dollar sign $ in the the first international reference
version ISO-646-IRV but this was revised in 1991 and
now ISO-646-IRV is a synonym for ISO-646-US or US-ASCII as it is used
in the core Internet protocols.
ISO-646's national variants have fallen out of use because of their
obvious problems and because the 8bit ASCII extension ISO-8859-1 has offered a more
interoperable standard.

ISO 8859 still isn't the ultimate
solution. The even wider Universal Character Set (UCS, Unicode) wants to simplify
world-wide multilingual and multi-platform text exchange just like the
ASCII standard simplified Latin text interchange and aims to become
equally successful as the ultimate encoding standard and thus
it was numbered ISO
10646 = ISO 646 + 10000. And the inevitable success seems to be
there:
RFC 2070
internationalized the Internet's hypertext markup language HTML and
declared ISO 10646 its new base charset.
RFC 2277
recommends the use of ISO 10646 to all new Internet protocols.

Unicode begins with US-ASCII and ISO-8859-1 but goes beyond the
8bit barrier and encodes all the
world's characters in a 16bit space and a 20bit extension zone for
everything that did not fit into the 16bit space.
The ASCII-compatible Unicode transformation scheme UTF-8 lets all ASCII characters pass through
transparently and encodes all other characters as unambiguous 8bit
character sequences.