Unicode resources

Unicode is essentially a universal character set. It contains nearly every character in every human language. However, Unicode is subtle. As I point out in my blog article on Unicode, it’s hard to say anything pithy about Unicode that is entirely correct. Every simple statement requires footnotes. Here are some resources I’ve found useful in understanding and using Unicode.

In general, Unicode characters can be inserted into HTML by putting their hexadecimal representation between &#x and a semicolon. For example, the Greek theta (θ) can be inserted into HTML by typing &#x03b8;. Some commonly used characters have mnemonic counterparts, such as &theta; for θ. However, there are only 252 such HTML entities and over 40,000 Unicode characters. Also, in general HTML mnemonic entities cannot be used in XML. There are four exceptions: &amp;, &gt;, &lt;, and &quot;. Note that just because a character is legal HTML does not mean the client’s browser will display it or display it correctly. See also math symbols and Greek letters.

Unicode in XML Unicode characters can be inserted into XML by quoting their code point numbers in hexadecimal, much like HTML. However, some characters are illegal or at least discouraged because they could confuse XML processors.

XeTeX
XeTeX is a version of TeX that works with Unicode. There is a XeLaTeX version of LaTeX as well.