Windows Collation Sorting Styles

During SQL Server Setup, you will be prompted to select either Windows collations or binary collations. Your choice of collations affects the data comparison and sort order behaviors of your instance of Microsoft SQL Server. This topic provides considerations for choosing Microsoft Windows collations or binary collations for your SQL Server implementation.

Note:

For Windows collations, the nchar, nvarchar, and ntext Unicode data types have the same sorting behavior as char, varchar, and text non-Unicode data types.

Windows collations define rules for storing character data based on an associated Windows locale. The base Windows collation rules specify which alphabet or language is used when dictionary sorting is applied, as well as the code page used to store non-Unicode character data. Binary collations sort data based on the sequence of coded values defined by the locale and data type. A binary collation in SQL Server defines the language locale and the ANSI code page to be used, enforcing a binary sort order. Binary collations are useful in achieving improved application performance due to their relative simplicity. For non-Unicode data types, data comparisons are based on the code points defined in the ANSI code page. For Unicode data types, data comparisons are based on the Unicode code points. For binary collations on Unicode data types, the locale is not considered in data sorts. For example, Latin_1_General_BIN and Japanese_BIN yield identical sorting results when used on Unicode data.

Previous binary collations in SQL Server performed an incomplete code-point-to-code-point comparison for Unicode data, in that older SQL Server binary collations compared the first character as WCHAR, followed by a byte-by-byte comparison. For backward compatibility reasons, existing binary collation semantics will not be changed.

Binary collations in this release of SQL Server also include a new set of pure code-point comparison collations. Customers can choose to migrate to the new binary collations to take advantage of true code-point comparisons, and they should utilize the new binary collations for development of new applications. The new BIN2 suffix identifies collation names that implement the new code-point collation semantics. In addition, a new comparison flag is added corresponding to BIN2 for the new binary sort. For more information, see Using Binary Collations.

The following table describes Windows collation sort order options for SQL Server 2005.

Sort order (suffix)

Sort order description

Binary (_BIN)1

Sorts and compares data in SQL Server tables based on the bit patterns defined for each character. Binary sort order is case sensitive and accent sensitive. Binary is also the fastest sorting order. For more information, see Using Binary Collations.

If this option is not selected, SQL Server follows sorting and comparison rules as defined in dictionaries for the associated language or alphabet.

Binary-code point (_BIN2)1

Sorts and compares data in SQL Server tables based on Unicode code points for Unicode data. For non-Unicode data, Binary-code point will use comparisons identical to binary sorts.

The advantage of using a Binary-code point sort order is that no data resorting is required in applications that compare sorted SQL Server data. As a result, a Binary-code point sort order provides simpler application development and possible performance increases. For more information, see Using Binary Collations.

Case-sensitive (_CS)

Distinguishes between uppercase and lowercase letters. If selected, lowercase letters sort ahead of their uppercase versions.

If this option is not selected, SQL Server considers the uppercase and lowercase versions of letters to be identical for sorting purposes.

Accent-sensitive (_AS)

Distinguishes between accented and unaccented characters. For example, 'a' is not equal to 'ấ'.

If this option is not selected, SQL Server considers the accented and unaccented versions of letters to be identical for sorting purposes.

Kana-sensitive (_KS)

Distinguishes between the two types of Japanese kana characters: Hiragana and Katakana.

If this option is not selected, SQL Server considers Hiragana and Katakana characters to be equal for sorting purposes.

Width-sensitive (_WS)

Distinguishes between a single-byte character and the same character when represented as a double-byte character.

If this option is not selected, SQL Server considers the single-byte and double-byte representation of the same character to be identical for sorting purposes.

1 If selected, the case-sensitive, accent-sensitive, kana-sensitive, and width-sensitive options are not available.