Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I am aware of when adding new fields to large tables, it is recommended to add them to the end of the fields rather than somewhere in the middle, and wondering if something like this applies when changing field types?

I have a table with about a million records that has several VARCHAR type fields. I would like to change these to NVARCHAR, but as I understand it, this will take some time and resources, as the fields are in the middle of the table, and SQL Server has to do a bunch of copying/re-ordering.

This question came from our site for professional and enthusiast programmers.

1

Is this an effort to make database internationalized? Because your index keys will double (AFAIR, NVARCHAR spend 2x the storage space of the varchar fields) and that, if you have an index key particularly long, can create a key that cannot be rebuilt (index keys are limited to 900 bytes, AFAIR)
–
Fabricio AraujoJul 19 '12 at 18:10

And it will tax your I/O subsystem if that table is really big (example: a table with 1M rows with a varchar column with string with average size of 40 chars. With varchar you'll have an table with 36Mb of char data, with nvarchar the same data will need 76Mb to be stored/read/written).
–
Fabricio AraujoJul 19 '12 at 18:23

Great discussion, and thank you for the insight. The information about indexes and sizes with nvarchar is most helpful. Does using nvarchar incurs additional performance hits over varchar?
–
ElHaixJul 20 '12 at 20:22

@ElHaix: I missed you comment in all the noise. Please see stackoverflow.com/questions/35366/… nvarchar has a considerable overhead in all respects over varchar. Saying that, if you have unicode data, use nvarchar. Don't hack something together to work around this
–
gbnJul 24 '12 at 17:41

2 Answers
2

Directly answering the question, there are two ways to execute the operation.

If the number of varchar columns involved on the table is little (one or two) is more practical to create pseudo-temp columns

If the number of varchar columns is bigger, the way above is not much practical - so you create a pseudo-table. This is most used on metadata updates scripts of some database tools like ErWin or ER/Studio (I used both tools, and reviewed the scripts generated before applying)

Note on large tables: If the table have few thousands of records or less, you could do the operation at once. In the case of million-records tables, is more practical to perform in batches (let's say 1000's or 100s of records each time).

Pseudo-temp columns

Pseudo-temp columns (I forgot if it have another, more appropriate, name) are columns used to store the result of a conversion. In this case, they would be also the final columns after the process.

Create the new columns with the intended length. Do not forget to include any check constraints or defaults on the new definition

Perform an update (or updates, see the observation above) to store the data of the old column in the new one.

Perform the log backup and do the checkpoint, to not allow the log to grow absurdly large.

If the old column have any constraint associated with it, drop them.

Drop the old column.

Rename the new column to the old column name

Rebuild the affected indexes (or all, if the affected column was also part of a clustered primary key constraint - it's rare someone using an (n)varchar as PK, but I had seen some).

When the modification is in more than an handful of columns, is more practical to
create a new table, based on the schema of the old one.

Create a new table, without any table constraints (PK,FK,etc). Bring only column ones at this time (NOT NULL, DEFAULT, CHECK, etc)

Insert the data at the old table in the new table (see the Note on large tables above). SET IDENTITY_INSERT here is a must.

Now, drop all the table constraints (PK, FKs, checks) and triggers on the old table. Recreate that constraints and triggers on the new table.

Recreate all other indexes (all at once or one-at-a-time, depending on your maintenance window) of the old table, on the new table. Unless the table have no clustered index, this have to be done after step 3. Or, at least, after the creation of the PK constraint.

Check if all gone right ( if you not forgot a trigger or a constraint in the process) and, if is all ok, drop the old table.

Rename the new table to the name on the old table

Note on step 4:If you have any duplicate indexes detected (detecting duplicate indexes are a very long subject, see Kimberly Tripp's blog on SQLSkills.com), that the chance to get rid of them if is that the case.

Performance implications

Changing from VARCHAR to NVARCHAR have some implications on performace, for any SQL Server below 2008R2, at least. For SQL 2008 R2, Aaron Bertrand have some blog posts on the Unicode compression feature- which can counter balance when NVarchar columns are used to store content that can be stored on VARCHAR columns. I didn't read them completely as the articles deserve, but the subject is interesting.

NVARCHAR columns tipically (IOW, before 2008R2) store all the chars in the columns with 2 bytes per char. For example, the string 'MSSQL' will be stored in 5 bytes on a VARCHAR column and 10 on a NVARCHAR one. Since non-LOB string columns are limited to store a maximum of 8000 bytes, means that VARCHAR can store 8000 characters, while NVARCHR are limited to 4000.

Implications of that facts:

Since index keys are limited to 900 bytes (see docs on CREATE INDEX), if you try to index an NVARCHAR(500) column the command will not fail (if this the only one column on the index key), but if you UPDATE or INSERT a row with more than 450 - (total size of other columns on the index key, if is the case) characters the operation will fail.

The more bytes to operate, more work to do.You read/write/compare/cache the double of bytes.

Depending on how massive the table is, the influence of the string columns on the stored size of the table and how the participation of the table on database size, you can expect a growing on the (used) database size and all the variables it affects directly or not (like backup/restore time, index maintenance, etc).

EDIT: as gbn stated, does not worth create something just to use VARCHARs when you have a clear requisite that needs NVARCHAR columns to be fullfilled.

Using batches, update a number of rows at a time (e.g. 1000 or 10000 rows)

Backup the log, checkpoint, what have you in between batches

When all the rows have been updated, drop the old column and rename the new one

Rebuild indexes

This won't be faster in the long run, and still requires a maintenance window (since you don't want users updating rows you've already updated unless you put a temporary trigger in place to counter that), but it will prevent a huge transaction and after a few updates will give you more predictability about how long it will take.

You could do the same thing by creating a new table, and renaming that once it's done... while this avoids the need for step 5, it would cause even more data churn and can be more problematic due to constraints, foreign keys, triggers etc. that might be involved with the table.