HTML Character Codes

HTML character references are short bits of HTML, commonly referred to as character entities or entity codes, that are used to display characters that have special meaning in HTML as well as characters that don't appear on your keyboard.

Characters with special meaning in HTML are called reserved characters. For example, left (<) and right (>) angle brackets are reserved in HTML to identify the opening and closing tags of elements.

A Practical Example

Let's say that you want to display a block of HTML in a web page and have the element tags show up on the page. You may try to do so by simply dropping <code> blocks around the block of HTML you want to display. However, what you will find is that even with the <code> tags surrounding the bit of HTML in question, it will still be processed as HTML and rendered by the browser. What we can do is replace all of the special characters with the appropriate character references to prevent the browser from processing the code.

As you can see, the code blocks around the first block of code did not prevent the browser from processing the HTML. However, by replacing certain characters in the second block with HTML character references, we can display the code block as HTML markup.

Character Entity Format

In HTML, there are three different ways to format a character entity. You can use the character name, a Unicode value, or a number. For example, an ampersand may be displayed using any of the following entities: &amp;, &#x00026;, or &#38;.

In all three cases, the format looks basically the same. Each entity begins with an ampersand (&), followed by the character name, Unicode, or number reference, and ends with a semicolon. When a number is used, it must be preceded by the pound symbol (&num;), and when a Unicode value is used, it must be preceded by a pound symbol and the letter x (&num;x).

Most people use character names rather than Unicode values or numbers when adding named characters to HTML documents since they're much easier to remember, but it's equally acceptable to use either the Unicode or number references as well.

Diacritics

There is one special subtype of character entity code that merits special mention: diacritical marks. These are marks that appear directly over the preceding letter and include accent marks and tildes. Here are the three most common diacritics:

Mark

Character Name

Number

Example

Acute

&DiacriticalAcute;

&#769;

a&#769; produces á

Grave

&DiacriticalGrave;

&#770;

a&#770; produces â

Tilde

&DiacriticalTilde;

&#771;

a&#771; produces ã

Support for diacritical mark character names is limited right now, and you will see more consistent results between browsers if you stick with the number codes until more browsers add support for the character names.

Most Common Character Codes

Here is a quick reference table with a few of the most commonly seen HTML character references: