Internationalization

Identifiers in programming languages have usually been in ASCII. Some programming languages are case-sensitive, some are not. A more recent
programming language like Java allows identifiers with a much wider character repertoire. Web formats also use
non-ASCII identifiers: HTML forms identify buttons by name; RDF gives names to properties of
resources; XML allows elements and attributes in a document to be called with non-ASCII names.

The use of ASCII only leads to some assumptions that don't extend easily. For example, every letter has a unique upper- and lowercase variant.
Beyond ASCII, there are scripts that don't have any concept of case, there are single letters that have a letter pair as an equivalent, and
upper-case/lower-case equivalents can depend on language. This is the reason why for example XML is case-sensitive.

Keyboard limitations are often cited as one potential problem. But even if a language uses thousands of characters, there is standard software
to input these characters from a general keyboard.

ASCII is not totally unambiguous. People have learned to distinguish between 'l' and '1', or 'O' and '0', in ASCII. People using other scripts
similarly know where characters can be difficult to identify, and can handle them as long as the differences are visible.

Things that are indistinguishable visually ('' and '') may be written in two different ways (single character and character with floating
accent).

To make internationalized identifiers work, it has been suggested that there should be a single convention, that system designers are
encouraged to adhere to. That way the likelihood of mistakes by, or surprises for users are minimized.

Martin Dürst has written a proposal (draft-duerst-i18n-norm-04.txt, expired ) in that direction.

The issue affects not just the Web, but the whole Internet. Martin Dürst's draft is discussed on the URI mailing list (uri@bunyip.com).