Configuring SQL Server Accent-Insensitivity

This article describes the steps to re-configure SQL Server so that queries return accent-insensitive results. It includes source code (using Microsoft SQL Server 2008 R2 Shared Management Objects) to automate some of the steps in this procedure.

Introduction

Our business applications frequently include functionality to support database queries based on first and last name, which is problematic when names contain
accent characters and the database collation is accent-sensitive. Suppose, for example, you need to query the database for users having a first name that starts
with "HELEN". Ideally, the result set should include:

HELEN
HELENA
HÉLÈNA
HELENE
HÉLÈNE

However, when your database collation is accent-sensitive, the result set includes only:

HELEN
HELENA
HELENE

Notice the names containing accented characters are excluded from the result set.
This is because accented characters are excluded from the query. Unfortunately, this is not the desired result.

In order to include the missing names, you could query for users having a first name that starts with "HELEN" OR "HÉLÈN", but it
may be impossible to predict all accent variations that might exist in a data source. More importantly for some, if query parameter values
are specified by the end-user, then you can't be certain that a typical user will know how to input accented characters from his or her computer keyboard.

Background

What is collation?

A collation in SQL Server is a defined set of rules for sorting and comparing data. This includes rules for case-sensitivity and accent-sensitivity.
When you select a collation (for your server, database, column, or expression), you assign specific characteristics to your data that affect the results of database operations.

When I installed SQL Server 2008 R2, the default collation was SQL_Latin1_General_CP1_CI_AS. The collation name is broken into these parts:

select * from users where name like'HELEN%'collate SQL_Latin1_General_CP1_CI_AI

Configure the database so that it is not accent-sensitive; change the collation to SQL_Latin1_General_CP1_CI_AI.

Options 1 and 2 are simplest from a database administrator's perspective, because they require no changes to the database.
Option 3 is simplest from a programmer's perspective, because it is a permanent solution, server-wide and database-wide, and it requires no programming changes
to applications. When you are installing a new SQL Server instance, then this is easy to do. However, if you have an existing SQL Server that you want to re-configure,
then the procedure is more difficult. This article provides a description of the steps you can follow to implement option 3 on an existing server with existing databases.

Steps to change the collation on SQL Server

Here is a high-level summary of the required steps:

Generate a script to re-create all user databases.

Export the data from all user databases.

Drop all user databases.

Rebuild the master database specifying the new collation.

Execute the script to re-create all user databases.

Import the data into all user databases.

It is important to note that you can't simply create a SQL Server backup of a database, then restore it to a server that is configured with a different default collation.
Collation settings are column-level attributes, which are restored "as is" from the backup. Therefore, you need to create new tables, with new collation settings,
and import the original data into the new tables.

Here it is important to note that you can change the collation on the tables in an existing database without changing the default collation for the server.
You can do this using a SQL script (e.g., Change Collation on All Tables.sql),
but this could leave unresolved issues with other database objects. Instead it might be better to use
a Change Collation utility program. Having said that, the problem with this approach is that
it requires you to remember to specify the collation whenever you add new columns to your tables. If you want a "fire and forget" solution, then you will want to change
the default collation for your server and for your database - not just the columns in your tables.

Step 1. Generate a SQL script to re-create all user databases

If you have a small number of databases, then you can use the Generate Scripts feature in SQL Server Management Studio.
Right-click the database and select Tasks / Generate Scripts. Follow the steps in the wizard to generate a SQL script that creates all of the objects in each user database.
To include INSERT statements for your data, click the Advanced button on the Scripting Options step, and select "Schema and data" for the setting named "Types of data to script".
Detailed instructions (with screenshots) are
available here.

The simple manual approach is not a workable solution if you have multiple servers hosting dozens (or perhaps hundreds) of databases.
Furthermore, if you have databases that contain several gigabytes of data, it isn't a viable option to script massive volumes of data into
INSERT statements in a SQL script file. Instead, you need a solution to automate the generation of SQL scripts for every database on a given server,
and a solution to automate the use of SQL Server's BCP utility to export and import the data quickly.

I created a console application called Siphon to generate a single-file SQL script to re-create each of the user databases on my server.
The code for this SQL Server utility program was inspired by Scriptio, an Open-Source project hosted
at CodePlex. Siphon is developed using Microsoft Visual Studio 2010, and it targets Microsoft SQL Server 2008 R2 servers.
If you are interested, you can download the source code.

Step 2. Export data from all user databases

Before you start, you should perform a regular SQL Server backup of your user databases. If something goes wrong when you run the next few steps, then having these
"emergency backups" will be critically important.

You can use the SQL Server bulk copy utility (BCP) to export the data in every table (in every user database)
to a file. The code to do this is straightforward using SMO and BCP together:

Step 6. Import data into all user databases

You can use the SQL Server bulk copy utility (BCP) to import data in every
table (in every user database) from the export files created in Step 2. The code to do this is straightforward using SMO and BCP together:

Share

About the Author

I have been consulting as a professional software systems expert for nearly twenty years, building solutions for organizations all over the world. My clients range from small non-profit associations in my own local community to global enterprises like Nestle, Disney, ABC News, and others.

I specialize in the design and implementation of Internet-based business solutions at InSite Systems. My work-related research interests include component-based software development, software development process and methodology, and information architecture and data mining.

I have hands-on experience with every aspect of the software project lifecycle, including management and technical roles. Customers value my ability to understand and appreciate their business needs, applying creativity and ingenuity to arrive at technology solutions that are flexible, efficient, and cost-effective.