Character Encoding

Each letter, number, punctuation character, and control character used on a computer has a binary value associated with it. Computer developers have defined schemes for coding the characters so that computers understand them. The process of coding the characters into machine-readable language is called character encoding.

Different encoding schemes exist that encode characters differently. For example, encoding letters and numbers from a U.S. keyboard are usually not a problem in North America; however, in another country, there may not be characters that correspond to the North American characters. Common data values that can cause problems in other environments include null, TAB, and accented letters (such as Ñ).

In AAMVAnet, you can use a variety of encoding schemes to make sure that characters in your data are translated correctly. For more information, see the tabs below or contact AAMVA Enterprise Architecture.

This scheme...

Is used on...

And has these features...

More Info

Has many variants; in the U.S., ASCII - ISO 8859-1 (Latin-1) is the most common. (In 2004, the 8859-1 working group stopped working on 8859-1, in order to concentrate on UNICode).

Uses primarily eight binary bits for each character.

In all variants, numbers and English letters map to the same hexadecimal values in the range 00 to 7f.

Above 7f, different characters are assigned differently in different variants. (The mapping in ASCII 8859-1 uses the UNICode table U000 Basic-Latin for hex values 00 to 7f, followed by UNICode Latin-1 for hex values 80 to ff which supports characters from other West European languages.)

If your system uses AMIE to communicate over AAMVAnet (see Note), messages are restricted to characters which can be used in the ASCII, EBCDIC-1140 and UTF-8 encoding schemes. These acceptable characters are sometimes referred to as “printable characters”.

This limitation exists because different types of computers use different data encoding schemes. Therefore, to communicate across the network, the only characters that can be used are those that are common to all computers connected to the network.

Note: If your application uses UNI, then AMIE is used to communicate over AAMVAnet.

Allowable Characters

Space

a to z

A to Z

0 to 9

! " # $ % & ' ( ) * + , - . / : ; < = > ? @ \ _ { | } ~

If your system uses XML to communicate over AAMVAnet (for example, if you are using a web service), then the UTF-8 scheme is typically used to encode printable characters. UTF-8 uses the Unicode character tables which support characters from many alphabets, including characters used in Asian languages. UTF-8 supports the following white space characters:

Space

TAB

Paragraph return (line feed carriage return)

Other binary data that cannot be mapped to a printable character must be excluded from message data.

Translating Reserved XML Code Characters

In XML, the angle bracket (< >), ampersand (&), apostrophe ('), and quotation characters (") are all used to denote XML code. To represent these characters in message data in a XML document, they must be replaced by the following "escape characters". When XML data is processed via a web service or a parser, it may automatically convert these characters to and from their XML form.

Example

Before Translation: Speeding > 5mph over "posted limit"

After Translation: Speeding &gt 5mph over &quotposted limit&quot

To represent this character...

This is used...

<

&lt

>

&gt

&

&amp

'

&apos

"

&quot

Following is information about delimiter and separator characters in other standards. If you use these standards, make sure that your vehicle and driver's license data does not contain these reserved characters.

This standard...

Uses these characters as separators and delimiters...

ANC X12

The asterisk (*) is the preferred separator for data elements, but you can specify others. If you communicate with organizations who use the asterisk as a separator (for example, organizations who use the Automated Commercial Environment's truck e-manifest), you may need to determine how you will exchange data that contains asterisks (for example, a driver's license number that contains an asterisk).

You can define delimiters for each interchange by specifying them in the interchange start segment. The only requirement is that the delimiter character you specify cannot be used elsewhere in the interchange.

ICAO 9303 (Names)

The less-than character (<) is the delimiter between names. This standard is used internationally on travel documents such as passports.

Line-Sequential Data

The paragraph mark (carriage return) is the delimiter between records or fields.