I’ve had multiple reports of a scam where persons have used my identity to entice IT professionals into applying for positions they have no business recruiting for. The scammers later ask for a PayPal payment for their services. The authorities have been notified of this fraudulent activity.

I urge everyone to be vigilant with regards to social engineering scams like this. Beware of unsolicited emails from persons you don’t know personally. It’s is easy for unscrupulous individuals to glean much information from publicly available sources (job descriptions, professional profiles, etc.) and tailor a message customized for a particular job seeker.

You might be surprised to learn that foreign keys bind to physical indexes when they are created. Furthermore, a foreign key does not necessarily bind to the primary key index of the referenced table; SQL Server allows a foreign key to refer to any column(s) that are guaranteed to be unique as enforced by a primary key constraint, unique constraint or unique index.

In this post, I’ll discuss the undocumented rules SQL Server uses to bind foreign key constraints to referenced table indexes so that you can achieve performance goals and protect yourself against unexpected errors in DDL modification scripts.

Background

Typically, one references the primary key in foreign key relationships. I’ve seen a foreign key (deliberately) reference columns other than the primary key only a couple of times in my career. The foreign key referenced an alternate key with a unique constraint in those cases. Why one would create such a relationship is an exercise for the reader. I’ll focus on the primary key here, although the same considerations apply to foreign keys referencing alternate keys.

As I mentioned earlier, SQL Server binds a foreign key to a physical unique index. This binding performance implications because it determines the index SQL Server uses to enforce referential integrity as child table rows are inserted or updated. Also, SQL Server will not allow the index bound to a foreign key to be dropped since that could allow duplicate rows in the parent table and thus break the unique side of the relationship. This must be considered when developing scripts that drop unique indexes (including primary key and unique constraints) that may be bound to foreign keys.

A foreign key referencing the primary key will always be bound to the primary key index when that is the only unique index on the foreign key column(s). However, you might have additional unique indexes on the primary key column(s) for performance reasons. For example, consider the case of a clustered primary key. Performance of a frequently executed query may be improved with a covering non-clustered index:

--create parent table

CREATETABLEdbo.ParentTable(

ParentTableIDintNOTNULLIDENTITY

CONSTRAINTPK_ParentTablePRIMARYKEYCLUSTERED

,Column1intNOTNULL

,Column2varchar(100)NOTNULL

);

GO

--create a non-clustered covering index

CREATEUNIQUENONCLUSTEREDINDEXidx_ParentTable_ParentTableID

ONdbo.ParentTable(ParentTableID)INCLUDE(Column1);

GO

INSERTINTOdbo.ParentTableVALUES(1,'some data');

INSERTINTOdbo.ParentTableVALUES(2,'some data');

INSERTINTOdbo.ParentTableVALUES(3,'some data');

GO

--create child table

CREATETABLEdbo.ChildTable(

ChildTableIDintNOTNULLIDENTITY

CONSTRAINTPK_ChildTablePRIMARYKEYCLUSTERED

,ParentTableIDintNOTNULL

CONSTRAINTFK_ChildTable_ParentTable

FOREIGNKEYREFERENCESdbo.ParentTable(ParentTableID)

);

GO

INSERTINTOdbo.ChildTableVALUES(1);

INSERTINTOdbo.ChildTableVALUES(1);

INSERTINTOdbo.ChildTableVALUES(1);

INSERTINTOdbo.ChildTableVALUES(1);

INSERTINTOdbo.ChildTableVALUES(2);

INSERTINTOdbo.ChildTableVALUES(2);

INSERTINTOdbo.ChildTableVALUES(2);

INSERTINTOdbo.ChildTableVALUES(2);

INSERTINTOdbo.ChildTableVALUES(3);

INSERTINTOdbo.ChildTableVALUES(3);

INSERTINTOdbo.ChildTableVALUES(3);

INSERTINTOdbo.ChildTableVALUES(3);

GO

UPDATESTATISTICSdbo.ParentTable;

UPDATESTATISTICSdbo.ChildTable;

GO

--show the foreign key index binding

SELECT

fki.name

FROMsys.foreign_keysASf

JOINsys.indexesASfkiON

fki.object_id=f.referenced_object_id

ANDfki.index_id=f.key_index_id

WHERE

f.object_id=OBJECT_ID(N'dbo.FK_ChildTable_ParentTable');

GO

--this query uses the covering index instead of clustered PK index

SELECTp.ParentTableID,p.Column1

FROMdbo.ParentTableASp

WHEREp.ParentTableIDIN(1,2,3);

GO

The SELECT query in the above script uses the covering idx_ParentTable_ParentTableID index. While this is good for performance, it introduces ambiguity regarding index binding to the foreign key. Again, any primary key constraint, unique constraint or index on the referenced column(s) may be referenced by a foreign key. With two candidate unique indexes (PK_ParentTable and idx_ParentTable_ParentTableID), you have little control which index is bound to the foreign key.

SQL Server chooses the index binding based on rules that vary by version so you will get different binding depending on your version of SQLServer. SQL Server 2005 chooses the clustered index when possible and, if no suitable clustered index exists, the first (lowest index_id) unique non-clustered index on the referenced column(s) is used. The sample script above binds the foreign key to the PK_WideTable index under SQL Server 2005 because it is the clustered index, not because it is the primary key.

In later versions (SQL 2008, SQL 2008R2 and SQL 2012), the foreign key is bound to the unique non-clustered index on the referenced column(s) with the lowest index_id when possible. Only when no suitable unique non-clustered index exists is the unique clustered index chosen. So the foreign key in the above script is bound to idx_ParentTable_ParentTableID in SQL 2008 and later versions instead of the primary key index as one might expect.

Why Foreign Key Index Binding is Important

There are two reasons why it is important to control the index bound to a foreign key. One is performance. As I mentioned earlier, the index bound to the foreign key constraint is used at execution time to enforce the constraint as child table rows are inserted or the foreign key column(s) updated. If the parent table is large and not queried often but rows are inserted into the child table heavily, a unique non-clustered index that “covers” the referential integrity check may be more desirable than the clustered index. This can improve buffer efficiency and page life expectancy compared to using a clustered index (e.g. primary key). My assumption is that this is why SQL Server 2008 and later versions prefer the unique non-clustered index over the clustered index for constraint enforcement.

Another reason one should control the index bound to the foreign key is to facilitate index changes. If you try to drop an index bound to a foreign key, you’ll get an error like “An explicit DROP INDEX is not allowed on index 'dbo.ParentTable.idx_ParentTable_ParentTableID '. It is being used for FOREIGN KEY constraint enforcement.” You’ll need to drop the foreign key first and recreate after dropping the index.

Since one can’t specify the bound foreign key index declaratively, the only guaranteed way to control the binding is to create the foreign key when only the desired unique index exists and create additional indexes afterward. This isn’t to say you can’t rely on the rules described earlier but you need to be aware that such rules vary depending on the SQL Server version and could change in the future.

I was very surprised when Microsoft announced deprecation of OLE DB provider for SQL Server data access last week on the Data Access Blog and MSDN Forums Announcement. The next release of SQL Server, code-named “Denali”, will be the last to ship a new SQL Server Native Client OLE DB provider. The SQL Server Native Client OLE DB driver will continue to be supported for 7 years after the Denali release so we have plenty of time to plan accordingly.

The other Microsoft-supplied OLE DB driver for SQL Server, SQLOLEDB, has been deprecated for many years now. The deprecated SQLOLEDB driver (and deprecated SQLSRV32.DLL ODBC driver) is part of the older MDAC package and is currently included in Windows operating systems as part of Windows Data Access Components for backwards compatibility. Windows 7 is the last Windows version that will include a SQL Server OLE DB and ODBC driver out of the box. Microsoft recommends that we use the SQL Server Native Client ODBC driver as the SQL Server data access technology of choice from native code going forward.

Note that much is still unknown since current versions of SQL Server rely heavily on OLE DB. Although this is purely speculation on my part, it stands to reason that we will see improved ODBC support across all Microsoft products and SQL Server features that currently rely on OLE DB for relational data access.

New SQL Server Development

Use one of the following SQL Server relational database access technologies for new development:

·Managed code (e.g. C#, VB.NET, managed C++): Use Sysem.Data SqlClient. SqlClient is part of the .NET framework and is the preferred way to access SQL Server from managed code (C#, VB.NET, managed C++). The only reason I can think why not to use SqlClient from managed code is if an application needs to also support other DBMS products using the same interface without coding an additional abstraction layer. In that case accessing different database products Sysem.Data.Odbc is an alternative.

·Native code (e.g. unmanaged C++): Use ODBC with the Server Native Client driver. The ODBC call-level interface can be used directly or via the higher-level ADO API. The SQL Server Native Client ODBC driver is included with SQL Server and also available as a separate download.

Migrating Existing Applications

I sometimes see existing managed applications use ADO (e.g. ADODB.Connection) instead of SqlClient. ADO is a COM-based API primarily intended to be used from native code rather than managed code. Typically, these applications were either converted from VB 6 or the developer used ADO instead of ADO.NET due to unfamiliarity with the ADO.NET object model. This is a good opportunity to convert such code to use System.Data.SqlClient, which will perform better than OLE DB or ODBC from managed code.

If you have an ADO application where performance is not a concern or the conversion is not worth the effort, an alternative is to simply change the provider to MSDASQL (OLE DB Provider for ODBC Drivers) and add the SQL Server Native Client ODBC driver specification. This can be done with a simple connection string change and the MSDASQL provider will translate the ADO OLE DB calls to ODBC. For example, to use the SQL Server 2008 SNAC ODBC driver:

The same connection string change can be used for any ADO application, including ASP classic, legacy VB 6 or unmanaged C++.

Perhaps the biggest challenge will be native code that uses the OLE DB COM interfaces directly instead of going through higher level APIs like ADO. I’ve seen this most commonly done for performance sensitive applications in C++. The best approach here will be to convert the application to use the ODBC call-level interface directly. This will provide the highest SQL Server data access performance from native code. The difficulty of such a change will depend much on the application object model and design. Ideally, data access libraries are shared and abstracted so that low-level data access code changes only need to be made in one place.

Why SQLOLEDB and SQLNCLI Was Deprecated

If you’ve used SQL Server for a long time like me, you’ve seen a number of APIs come and go (http://blogs.msdn.com/b/data/archive/2006/12/05/data-access-api-of-the-day-part-i.aspx). APIs are largely driven by changes in development and platform technologies that change over time. It is possible for Microsoft to support legacy APIs indefinitely but doing so would waste precious development resources on maintenance instead of adding new features that are important to us. COM-based APIs like OLE DB are complex and it just doesn’t make sense to have many APIs that basically do the same thing.

So we now have the short list of SQL Server relational data access APIs going forward:

Not to mince words, T-SQL error handling has historically sucked. I’m excited that SQL Server “Denali” CTP3 (a.k.a. SQL11) includes a long-awaited THROW statement that I hope to see in the final release. In this post, I’ll dive into how this seemingly minor T-SQL enhancement will make it much easier for T-SQL developers to write robust and bug-free error handling code.

T-SQL Error Handling Ugliness

Unlike compiled application code that halts code execution upon an unhandled exception, a T-SQL might continue code execution afterward. T-SQL developers must include error checking/handling is to ensure code doesn’t continue down the “happy” path oblivious to an error, report the error to the caller, perform any necessary cleanup operations (typically ROLLBACK) and continue/halt execution as desired. The script below shows how one might accomplish this without structured error handling:

--Unstructured error handling example

BEGINTRAN

SELECT 1/0 AS CauseAnError --report error caller

IF@@ERROR<> 0 GOTO ErrorHandler --detect error

COMMIT

GOTO Done

ErrorHandler:

IF@@TRANCOUNT> 0 ROLLBACK--cleanup after error

RETURN--stop further code execution

Done:

PRINT'Done'--not executed after error

GO

This script results in the error:

Msg 8134, Level 16, State 1, Line 3

Divide by zero error encountered.

Unstructured error handling like this is especially a pain for multi-statement scripts and stored procedures. One has to include repetitive “IF @@ERROR” check to detect errors after each statement and error-prone unstructured GOTO code. It’s easy to miss error checking/handling bugs in unit testing.

On a positive note, no T-SQL code is necessary to raise the error; SQL Server automatically reports errors to the calling application without any T-SQL code to do so (unless TRY/CATCH is used). This guarantees the calling application is notified of errors during execution.

Two Steps Forward, One Step Back

The introduction of structured error handling (TRY/CATCH) in SQL 2005 is a both a blessing and a curse. The good is that TRY/CATCH avoids the repetitive, error prone and ugly procedural code needed to check @@ERROR after each T-SQL statement and allows one to more easily centralize error handling. The structured error-handling paradigm in T-SQL is more aligned with most application languages.

Consider the equivalent script with TRY/CATCH:

--Structured error handling example

DECLARE

@ErrorNumber int

,@ErrorMessage nvarchar(2048)

,@ErrorSeverity int

,@ErrorState int

,@ErrorLine int;

BEGINTRY--detect errors

BEGINTRAN;

SELECT 1/0 AS CauseAnError;

COMMIT;

ENDTRY

BEGINCATCH

SELECT

@ErrorNumber =ERROR_NUMBER()

,@ErrorMessage =ERROR_MESSAGE()

,@ErrorSeverity =ERROR_SEVERITY()

,@ErrorState =ERROR_STATE()

,@ErrorLine =ERROR_LINE();

IF@@TRANCOUNT> 0 ROLLBACK;--cleanup after error

RAISERROR('Error %d caught at line %d: %s'--report error to caller

,@ErrorSeverity

,@ErrorState

,@ErrorNumber

,@ErrorLine

,@ErrorMessage);

RETURN;--stop further code execution

ENDCATCH

PRINT'Done';--not executed after error

GO

Msg 50000, Level 16, State 1, Line 21

Error 8134 caught at line 10: Divide by zero error encountered

I really like the way structured error handling catches errors declaratively with centralized error handling. But TRY/CATCH introduces a couple of issues. Foremost is reporting of the error to the caller. A caught error prevents the error message from being returned to the client.When TRY/CATCH is employed, the developer assumes responsibility to notify the application that an error occurred. Failure to do so will result in a silent error undetectable by the calling application, which is seldom desirable. Using TRY/CATCH necessitates that you write a bit of code in the CATCH block to capture, report and/or log error details as well as control code flow after the error.

Another downside of TRY/CATCH before Denali is that you cannot raise the original error because RAISERROR does not allow a system error number to be specified (8134 in this example). Consequently, the divide by zero system error here cannot be raised in the CATCH block; a user-defined error in the 50000+ error number range must be raised instead, obfuscating the original error and line number. So instead of returning error information natively, you must write code to return original error details by some other means, such as in the error message text. This often leads to inconsistencies in the way errors are reported.

THROW to the Rescue

Denali introduces a simple THROW statement. THROW in a CATCH block with no parameters raises the caught error and stops further code execution unless an outer CATCH block exists. This greatly simplifies CATCH block error reporting and control flow code since this THROW behavior is exactly what one typically does after handling a T-SQL error. Furthermore, unlike RAISERROR, THROW retains the original error number, message text, state, severity and line number. This is the biggest T-SQL error handling enhancement since the introduction of TRY/CATCH in SQL Server 2005.

The THROW example below raises the original error and stops further code execution and is less verbose and error-prone than other methods:

--Structured error handling example in Denali CTP3

BEGINTRY--detect errors

BEGINTRAN;

SELECT 1/0 AS CauseAnError;

COMMIT;

ENDTRY

BEGINCATCH

IF@@TRANCOUNT> 0 ROLLBACK;--cleanup after error

THROW;--report error to caller and stop further code execution

ENDCATCH

PRINT'Done';--not executed after error

GO

Msg 8134, Level 16, State 1, Line 4

Divide by zero error encountered.

There are only a couple of scenarios I can think of not to use THROW in a CATCH block. One is when you need to continue code execution in the same scope after an error. Another is in an outermost catch block when you want to prevent the error from being returned to the client. However, these cases are the exception (no pun intended) rather than the rule.

Summary

THROW is a simple, yet powerful extension to SQL Server error handling. I’ll discuss some other enhancements to the core database engine as outlined in the What’s New section of the SQL Server “Denali” Books Online in future posts as well.

A database created by a more recent version of SQL Server cannot be attached or restored to an earlier version. This restriction is simply because an older version cannot know about file format changes that were introduced in the newer release.

If you attempt to attach a database to an earlier version, you will get SQL Server error 948 with the internal version numbers listed in the error message text. For example, the following error occurs if you try to attach a SQL Server 2008 R2 database to a SQL Server 2008 server:

The database 'MyDatabase' cannot be opened because it is version 665. This server supports version 661 and earlier. A downgrade path is not supported.

Sample text from SQL Server error 948

The cryptic version numbers in the error message refer to the internal database version. These internal version numbers are undocumented but are (at least currently) the same value reported by the DATABASEPROPERTYEX function 'Version' property of the source database. If you are unsure of the source database version, the table below maps the internal version numbers to SQL Server versions so you can determine the minimum version you need for the attach to succeed:

SQL Server Version

Internal Database Version

SQL Server 2008 R2

665

SQL Server 2008

661

SQL Server 2005 SP2+ with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server versions and internal database versions

Below are the allowable SQL Server upgrade paths for a database attach or restore. The internal database version will be as above after a successful attach or restore.

Target SQL Server Version

Source SQL Server Version

Internal Database Version

SQL Server 2008 R2

SQL Server 2008 R2

665

SQL Server 2008

661

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 2008

SQL Server 2008

661

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 2005 SP2+

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server 2005

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server 2000

SQL Server 2000

539

SQL Server 7

515

SQL Server 7

SQL Server 7

515

Database File Versions and Upgrade Paths

As I mentioned earlier, downgrades are not supported. You’ll need to copy objects and data from the newer source database to the older target if you need to downgrade; attach or restore is not an option to copy a database to an earlier version.

This is the first of a series of posts on SQL Server connection strings. I don’t think connection strings are all that complicated but I often see developers have problems because they simply cloned an existing connection string (or found one on the internet) and tweaked it for the task at hand without really understanding what the keywords and values mean. This often results in run-time errors that can be tricky to diagnose.

In this post, I’ll provide a connection string overview and discuss SqlClient connection strings and examples. I’ll discuss OLE DB and ODBC (used via ADO or ADO.NET) and JDBC in more detail the future articles.

Overview

SQL Server can be accessed using several technologies, each of which has different connection string particulars. Connection strings are provider/driver specific so one first needs to decide on a client API before formulating the proper string can be created.

All connection strings share the same basic format, name/value pairs separated by semicolons, but the actual connection string keywords may vary by provider. Which keywords are required or optional also vary by provider and providers often share the same keywords (or provide synonyms) to minimize the connection string changes when switching between different providers. Most connection string keywords are optional and need to be specified only when the default is not appropriate. Connection string values should be enclosed in single or double quotes when the value may include a semicolon or equal sign (e.g. Password="a&==b=;1@23")

The purpose of a connection string is to supply a SQL Server provider/driver with the information needed to establish a connection to a SQL Server instance and may also be used to specify other configuration values, such as whether connection pooling is used. At the end of the day, the provider/driver needs to know at least:

·SQL Server name (or address)

·Authentication method (Windows or SQL Server)

·Login credentials (login and password for SQL Server authentication)

SqlClient

One typically uses the .Net Framework Provider for SQL Server (abbreviated to SqlClient here) in managed code and a SQL Server OLE DB provider or ODBC driver from unmanaged code. It is possible to use OLE DB or ODBC for SQL Server data access in managed code but there is seldom a reason to do so since SqlClient offers high-performance access to SQL Server natively.

The SqlConnectionStringBuilder class provides a programmatic way to build connection strings needed by SqlConnection class. The nice thing about SqlConnectionStringBuilder is that it provides IntelliSense and avoids connection string typos. It should always be used when constructing connection strings based in user input (e.g. user id and password prompt). But you still need to know which connection string properties (keywords) you need to set along with the default values. The examples here apply regardless of whether or not you use yjr SqlConnectionStringBuilder class.

SqlClient Connection String Keyword Examples

Unlike other providers, there is no “Provider” or “Driver” connection string keyword in a SqlClient connection string. The .Net Framework Provider for SQL Server is implicit with a SqlConnection class so it is redundant to also specify the provider.

I’ll start with the minimal keyword(s) needed. The minimal SqlClient connection string need only specify the authentication method. The example below specifies Windows authentication using “Integrated Security=SSPI”. This connection string will connect the default instance on the local machine using Windows authentication under the current process Windows security credentials.

Integrated Security=SSPI

Listing 1: Connect to local default instance using Windows authentication

To connect to the local default instance using SQL authentication, just specify the credentials using the “User ID” and “Password” keywords instead of “Integrated Security=SSPI” keyword. SQL authentication is the default when “Integrated Security” or “Trused_Connection” keyword is not specified. Although I commonly see "Persist Security Info=False" also specified (a best practice from a security perspective), that is the default setting and may be omitted. Be aware that you should encrypt connection strings (or passwords in general) stored in configuration files when using SQL authentication.

User ID=MyLogin;Password=MiP@ssw0rd

Listing 2: Connect to local default instance using SQL authentication

One often connects to a remote SQL Server. Along with the authentication method, add the Data Source keyword to specify the desired SQL Server name or network address.

Note that these same connection strings may be used to connect locally or remotely. Personally, I recommend always specifying the Data Source even when connecting locally. This makes it easy to move the application to another machine using with the same configuration and helps avoid oversights.

It is usually best to let SqlClient determine the appropriate network library to use rather than an explicit specification. SqlClient will figure out the appropriate network library based on the specified Data Source value. When you connect to a local instance using an unqualified name (or the value “(local)”), Shared Memory is used by default. SqlClient will use TCP/IP if a FQDN (e.g. SQLSERVERNAME.MyDOMAIN.COM) or IP address is specified regardless of whether the instance is local or remote. Since TCP/IP is most commonly used nowadays, I’ll focus on TCP/IP in this article and use a FQDN in the subsequent examples to avoid ambiguity.

It is often desirable to specify the initial database context in the connection sting. If omitted, the default database of the authenticated account is used. This is accomplished using either the “Initial Catalog” or “Database” keyword. I suggest always including the “Initial Catalog” keyword.

The connection strings I’ve shown so far assume the target is a default SQL Server instance listening on port 1433. One can run multiple instances of SQL Server on the same host using the named instance feature. If your target database instance is a named instance, SqlClient will also need to know the instance name or instance port number. The instance name can be specified by appending a backslash and instance name to the Data Source value:

Listing 5: Connect to named instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase

As an aside, I often see connectivity problems with named instances due to oversights in the SQL Server configuration. When an instance name is specified, SqlClient interrogates the SQL Server Brower service on the SQL Server host to determine the instance port (or named pipe name). The SQL Server Brower service is disabled by default so you need to enable and start it in order to connect by the instance name. This can be done using the SQL Server Configuration Manager tool. Also, since the SQL Server Brower service communicates over UDP port 1434, that port must be allowed through firewalls.

You can specify a port number instead of instance name to directly to a named instance (or to a default instance listing on a non-standard port). The port may be specified by appending a comma and port number to the data source value. The needed port number can be ascertained from the SQL Server Configuration Manager tool.

In addition to the “Data Source”, “Initial Catalog” and “Integrated Security” (or “User Id” and “Password”) keywords I’ve discussed so far, I recommend that “Application Name” also be specified. The specified string is helps identify the application when monitoring activity on the database server. This is especially useful when an application server or client hosts multiple applications.

Listing 7: Connect to default instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase with application name specification

In my opinion, the many other keywords are noise unless the default values are inappropriate for your environment.

Summary

You can get by nicely in most cases with only the 4 or 5 SqlClient connection string keywords I’ve discussed here. I suggest you establish a connection string standard that includes the “Data Source”, “Initial Catalog”, “Application Name” keywords plus the authentication method, “Integrated Security=SSPI” or “User Id” and “Password”.

SQL Server table partitioning can reduce storage costs associated with large tables while maintaining performance SLAs.Table partitioning, available in Enterprise and above SKUs, allows you to keep frequently used current data on fast storage while storing infrequently accessed older data on slower, less expensive storage.But moving vast amounts of data efficiently as data ages can be a challenge.This post will discuss alternate techniques to accomplish this task.

Consider the scenario of a table partitioned on a datetime column by month.Your objective is to keep recent (current and prior month) data on a solid state disk and older data on traditional spinning media.2 filegroups are used for this table, one with files on a solid state device and the other with files on spinning disks.The table is partitioned with a RANGE RIGHT partition function (inclusive date boundary) and monthly sliding window maintenance is scheduled to create a partition for the new month and perhaps remove the oldest month.Every month after the slide, you want to move an older partition (prior month minus 1) from fast to slow storage to make room for new data on the fast file group.

The Simple Method

The easiest way to move a partition from the NewerData file group to the OlderData filegroup is with MERGE and SPLIT.The example below will move the February partition from the NewerData to the OlderData filegroup:

Simple maintenance script example:

-- Monthly Partition Move Scipt

-- merge month to be moved into prior month partition

ALTERPARTITIONFUNCTION PF_Last12Months()

MERGERANGE ('20110201');

-- set partition scheme next used to the OlderData filegroup

ALTERPARTITION SCHEME PS_Last12Months

NEXT USED OlderData;

-- move data from NewData to OlderData filegroup

ALTERPARTITIONFUNCTION PF_Last12Months()

SPLIT RANGE ('20110201');

The figures below show the partitions before and after this script was run against a 10M row test table (setup script with complete DDL and sample data at the end of this post).Although this method is quite easy, it can take quite a bit of time with large partitions.This MERGE command will merge February data into the January partition on the OlderData filegroup, requiring all of February’s data to be moved in the process, and then remove the February partition.The SPLIT will then create a new February partition on the OlderData filegroup, move February data to the new partition and finally remove the February data from the source partition.So February data is actually moved twice, once by the MERGE and again by the SPLIT.

This MERGE/SPLIT process took 52 seconds on my test system with a cold buffer cache but I was only moving 738,780 rows.Think about the performance impact of this method against a much larger production table partition.The atomic MERGE and SPLIT are offline operations so the entire table is unavailable while those statements are running.Also, these operations are resource intensive when a lot of data needs to be moved and/or you have many indexes.

Before maintenance:

Rows

Partition Number

Filegroup

Lower Boundary

Upper Boundary

0

1

PartitioningDemo_OlderData

4/1/2010 12:00:00 AM

791,549

2

PartitioningDemo_OlderData

4/1/2010 12:00:00 AM

5/1/2010 12:00:00 AM

817,935

3

PartitioningDemo_OlderData

5/1/2010 12:00:00 AM

6/1/2010 12:00:00 AM

791,550

4

PartitioningDemo_OlderData

6/1/2010 12:00:00 AM

7/1/2010 12:00:00 AM

817,935

5

PartitioningDemo_OlderData

7/1/2010 12:00:00 AM

8/1/2010 12:00:00 AM

817,935

6

PartitioningDemo_OlderData

8/1/2010 12:00:00 AM

9/1/2010 12:00:00 AM

791,550

7

PartitioningDemo_OlderData

9/1/2010 12:00:00 AM

10/1/2010 12:00:00 AM

817,935

8

PartitioningDemo_OlderData

10/1/2010 12:00:00 AM

11/1/2010 12:00:00 AM

791,550

9

PartitioningDemo_OlderData

11/1/2010 12:00:00 AM

12/1/2010 12:00:00 AM

817,935

10

PartitioningDemo_OlderData

12/1/2010 12:00:00 AM

1/1/2011 12:00:00 AM

817,935

11

PartitioningDemo_OlderData

1/1/2011 12:00:00 AM

2/1/2011 12:00:00 AM

738,780

12

PartitioningDemo_NewerData

2/1/2011 12:00:00 AM

3/1/2011 12:00:00 AM

817,935

13

PartitioningDemo_NewerData

3/1/2011 12:00:00 AM

4/1/2011 12:00:00 AM

369,476

14

PartitioningDemo_NewerData

4/1/2011 12:00:00 AM

5/1/2011 12:00:00 AM

0

15

PartitioningDemo_NewerData

5/1/2011 12:00:00 AM

After maintenance:

Rows

Partition Number

Filegroup

Lower Boundary

Upper Boundary

0

1

PartitioningDemo_OlderData

4/1/2010 12:00:00 AM

791,549

2

PartitioningDemo_OlderData

4/1/2010 12:00:00 AM

5/1/2010 12:00:00 AM

817,935

3

PartitioningDemo_OlderData

5/1/2010 12:00:00 AM

6/1/2010 12:00:00 AM

791,550

4

PartitioningDemo_OlderData

6/1/2010 12:00:00 AM

7/1/2010 12:00:00 AM

817,935

5

PartitioningDemo_OlderData

7/1/2010 12:00:00 AM

8/1/2010 12:00:00 AM

817,935

6

PartitioningDemo_OlderData

8/1/2010 12:00:00 AM

9/1/2010 12:00:00 AM

791,550

7

PartitioningDemo_OlderData

9/1/2010 12:00:00 AM

10/1/2010 12:00:00 AM

817,935

8

PartitioningDemo_OlderData

10/1/2010 12:00:00 AM

11/1/2010 12:00:00 AM

791,550

9

PartitioningDemo_OlderData

11/1/2010 12:00:00 AM

12/1/2010 12:00:00 AM

817,935

10

PartitioningDemo_OlderData

12/1/2010 12:00:00 AM

1/1/2011 12:00:00 AM

817,935

11

PartitioningDemo_OlderData

1/1/2011 12:00:00 AM

2/1/2011 12:00:00 AM

738,780

12

PartitioningDemo_OlderData

2/1/2011 12:00:00 AM

3/1/2011 12:00:00 AM

817,935

13

PartitioningDemo_NewerData

3/1/2011 12:00:00 AM

4/1/2011 12:00:00 AM

369,476

14

PartitioningDemo_NewerData

4/1/2011 12:00:00 AM

5/1/2011 12:00:00 AM

0

15

PartitioningDemo_NewerData

5/1/2011 12:00:00 AM

SWITCH and DROP_EXISTING Method

An alternative to the method above is to employ SWITCH along with the DROP EXISTING option of CREATE INDEX.As you may know, SWITCH of an aligned partition is a metadata-only operation and is very fast because no physical data movement is required.Furthermore, CREATE INDEX…WITH DROP_EXISTING = ON avoids sorting when the existing table index is already suitably sorted and is especially appropriate for improving performance of large index rebuilds.Using these commands, instead of relying on SPLIT and MERGE to move data, will greatly reduce the time needed to move a partition from one filegroup to another.The maintenance script below reduced the time of the partition move from 52 seconds down to 7 seconds, reducing maintenance time by over 85% compared to the MERGE/SPLIT script above.

Demo Maintenance Script

-- Monthly Partition Move Scipt

DECLARE @MonthToMove datetime='20110201';

-- create staging table on NewerData filegroup with aligned indexes

IFOBJECT_ID(N'dbo.PartitionMoveDemoStaging')ISNOTNULL

DROPTABLE dbo.PartitionMoveDemoStaging;

CREATETABLE dbo.PartitionMoveDemoStaging(

PartitioningDateTimeColumn datetimeNOTNULL

,Column1 bigintNOTNULL

)ON PartitioningDemo_NewerData;

CREATECLUSTEREDINDEX cdx_PartitionMoveDemoStaging_PartitioningColumn

ON dbo.PartitionMoveDemoStaging(PartitioningDateTimeColumn)

ON PartitioningDemo_NewerData;

CREATENONCLUSTEREDINDEX idx_PartitionMoveDemoStaging_Column1

ON dbo.PartitionMoveDemoStaging(Column1)

ON PartitioningDemo_NewerData;

-- switch partition into staging table

ALTERTABLE dbo.PartitionMoveDemo

SWITCH PARTITION$PARTITION.PF_Last12Months(@MonthToMove)

TO dbo.PartitionMoveDemoStaging;

-- remove partition

ALTERPARTITIONFUNCTION PF_Last12Months()

MERGERANGE (@MonthToMove);

-- set next used to OlderData filegroup

ALTERPARTITION SCHEME PS_Last12Months

NEXT USED PartitioningDemo_OlderData;

-- recreate partition on OlderData filegroup

ALTERPARTITIONFUNCTION PF_Last12Months()

SPLIT RANGE (@MonthToMove);

-- recreate staging table indexes using the partition scheme

-- this will move the staging table to OlderData filegroup with aligned indexes

CREATECLUSTEREDINDEX cdx_PartitionMoveDemoStaging_PartitioningColumn

ON dbo.PartitionMoveDemoStaging(PartitioningDateTimeColumn)

WITH (DROP_EXISTING=ON)

ON PS_Last12Months(PartitioningDateTimeColumn);

CREATENONCLUSTEREDINDEX idx_PartitionMoveDemoStaging_Column1

ON dbo.PartitionMoveDemoStaging(Column1)

WITH (DROP_EXISTING=ON)

ON PS_Last12Months(PartitioningDateTimeColumn);

-- switch staging table back into primary table partition

ALTERTABLE dbo.PartitionMoveDemoStaging

SWITCH PARTITION$PARTITION.PF_Last12Months(@MonthToMove)

TO dbo.PartitionMoveDemo PARTITION$PARTITION.PF_Last12Months(@MonthToMove);

The maintenance steps here are similar to the first method except that the partition is SWITCHed into a staging table before the MERGE and SPLIT.This way, no data movement is needed during the MERGE or SPLIT.After the MERGE and SPLIT, staging table indexes are recreated using the same partition scheme as the primary table.This will move the staging table from the NewerData to the OlderData filegroup and ensure staging table indexes are aligned for the SWITCH.The DROP_EXISTING = ON option allows the CREATE INDEX to leverage the existing staging table index sequence, thus eliminating the need to sort the index keys.Finally, the staging table is SWITCHed back into the moved partition.

I hope you find this method useful.Below is the script I used to create the demo database and objects.

SQLServerCentral.com launched a new Stairway content series today, targeting specific areas of SQL Server.Each Stairway includes a series of up to 12 levels focused on a specific SQL Server topic. The goal is to guide DBAs and developers with little or no understanding of a subject through a sequence of tutorials in order to quickly gain the knowledge one needs to use a SQL Server feature confidently in a production environment. Kalen Delaney, editor of the Stairway series, is one of the most respected experts in the world-wide SQL Server community.

I was flattered when Kalen gave me the opportunity to contribute to the series with a Stairway on Server-side Tracing.For years I’ve cautioned against using Profiler indiscriminately both here as well as in the MSDN forums and newsgroups.But it seems many DBAs still don’t differentiate between Profiler and server-side tracing.I’m hoping this Server-side Tracing Stairway will empower DBAs with the knowledge to choose the right tool for the job.

My apologies for having gone dark for the last several months.The subject of this post is the primary reason; there are only so many hours in the day L

I frequently see questions in the forums and newsgroups about how to best query date/time data and perform date manipulation.Let me first say that a permanent calendar table that materializes commonly used DATEPART values along with time periods you frequently use is invaluable.I’ve used such a table for over a decade with great success and strongly recommend you implement one on all of your database servers.I’ve included a sample calendar table (and numbers table) later in this post and you can find other variations of such a table via an internet search.

Removing the Time Portion

A common requirement I have is to remove the time portion from a date/time value.This is easy in SQL 2008 since you can simply “CAST(SomeDateTimeValue AS date)”.But the date data type is not available in older SQL Server versions so you need an alternate method.In SQL 2005 and earlier versions, I recommend the DATEADD…DATEDIFF method below with an arbitrary base date value specified in a format that is independent of the session DATAFORMAT setting:

I often see a variation of the DATEADD…DATEDIFF technique with the integer zero (no quotes) specified as the base date.Although this may provide the expected results (I’ve done it myself), I caution against it because it relies on implicit conversion from the internal SQL Server integer date/time storage format.If you want to be concise, a better approach is to specify an empty string for the base date value since the default value is ‘1900-01-01 00:00:00’.In my opinion, an explicit data value is more intuitive, though.

SELECTDATEADD(day,DATEDIFF(day,'',GETDATE()),'');

I also sometimes see code that extracts the year, month and day date parts and concatenates with separators.However, that method is dependent on session DATEFORMAT settings and slower than other methods.See Tibor Karaszi’s The ultimate guide to the datetime datatypes article for details.

First and Last Day of Period

Another common task is to determine the first or last day of a given period.The script below shows how to accomplish this of you don’t have a calendar table with the calculated values available.

DECLARE @Date date=GETDATE();

SELECT'First day of year' [DateDescription],DATEADD(year,DATEDIFF(year,'19000101',@Date),'19000101')AS [CalendarDate]

UNIONALL

SELECT'Last day of year',DATEADD(day,-1,DATEADD(year,0,DATEADD(year,DATEDIFF(year,'19000101',@Date)+1,'19000101')))

UNIONALL

SELECT'First day of month',DATEADD(month,DATEDIFF(month,'19000101',@Date),'19000101')

UNIONALL

SELECT'Last day of month',DATEADD(day,-1,DATEADD(month,0,DATEADD(month,DATEDIFF(month,'19000101',@Date)+1,'19000101')))

SELECT'Last day of week (based on DATEFIRST setting)',(SELECT LastDateOfWeek FROM dbo.Calendar WHERE CalendarDate = @Date);

Calendar and Numbers Table

I think auxiliary calendar and number tables are a must-have on every database server.These objects allow you to easily perform set-based processing in a number of scenarios.In fact, the calendar table population script below uses a numbers table to populate the calendar table with several thousand rows in under a second.This is much more efficient that a WHILE loop.

This calendar table population script also updates the table with most US holidays and adjusts business/non-business days accordingly.In addition to customizing the script for holidays as observed by your organization, you might add fiscal period start/end dates to facilitate querying based on those cycles.Also consider creating user-defined functions or stored procedures to encapsulate frequently used code that uses the calendar table.For example, here is a function that returns the date that is a specified number of business days from the date provided:

Why would a trace of long-running queries not show all queries that exceeded the specified duration filter?We have a server-side SQL Trace that includes RPC:Completed and SQL:BatchCompleted events with a filter on Duration >= 100000.Nearly all of the queries on this busy OLTP server run in under this 100 millisecond threshold so any that appear in the trace are candidates for root cause analysis and/or performance tuning opportunities.

After an application experienced query timeouts, the DBA looked at the trace data to corroborate the problem.Surprisingly, he found no long-running queries in the trace from the application that experienced the timeouts even though the application’s error log clearly showed detail of the problem (query text, duration, start time, etc.).The trace did show, however, that there were hundreds of other long-running queries from different applications during the problem timeframe.We later determined those queries were blocked by a large UPDATE query against a critical table that was inadvertently run during this busy period.

So why didn’t the trace include all of the long-running queries?The reason is because the SQL Trace event duration doesn’t include the time a request was queued while awaiting a worker thread.Remember that the server was under considerable stress at the time due to the severe blocking episode.Most of the worker threads were in use by blocked queries and new requests were queued awaiting a worker to free up (a DMV query on the DAC connection will show this queuing: “SELECT scheduler_id, work_queue_count FROM sys.dm_os_schedulers;”).Technically, those queued requests had not started.As worker threads became available, queries were dequeued and completed quickly. These weren’t included in the trace because the duration was under the 100ms duration filter.The duration reflected the time it took to actually run the query but didn’t include the time queued waiting for a worker thread.

The important point here is that duration is not end-to-end response time.Duration of RPC:Completed and SQL:BatchCompleted events doesn’t include time before a worker thread is assigned nor does it include the time required to return the last result buffer to the client.In other words, duration only includes time after the worker thread is assigned until the last buffer is filled.But be aware that duration does include the time need to return intermediate result set buffers back to the client, which is a factor when large query results are returned.Clients that are slow in consuming results sets can increase the duration value reported by the trace “completed” events.

Arbitrary Intervals

The simple rollup method works well for any of the pre-defined units provided by the DATEADD function (year, quarter, month, day, hour, minute, second or week).However, it lacks the flexibility to roll up to an arbitrary interval like 15 minutes or 30 seconds.A little DATEADD/DATEDIFF math addresses this gap.Below is an example of a 30-minute interval rollup using this technique:

Missing Intervals

You probably noticed that periods with no activity at all are omitted rather than reporting a zero value.One method to include the missing intervals is with an outer join to a temporal table containing all the desired intervals.Ideally, the temporal table would be a permanent one but I've found it impractical to maintain such a table for ad-hoc needs.Fortunately, a utility numbers CTE is a handy way to generate the needed intervals dynamically.The example below provides up to 65,536 interval values and can be easily extended as needed.

DECLARE

@StartTimestamp datetime='2010-01-01T00:00:00'

,@EndTimestamp datetime='2010-01-01T04:00:00'

,@IntervalSeconds int= 1800;--30 minutes

WITH

T2 AS (SELECT 0 AS Num UNIONALLSELECT 0),

T4 AS (SELECT 0 AS Num FROM T2 AS A CROSSJOIN T2 AS B),

T256 AS (SELECT 0 AS Num FROM T4 AS A CROSSJOIN T4 AS B CROSSJOIN T4 AS C CROSSJOIN T4 AS D),

T65536 AS (SELECTROW_NUMBER()OVER(ORDERBY A.Num)AS Num FROM T256 AS A CROSSJOIN T256 AS B)

In this final post of my Collation Hell series, I'll discuss techniques to change a SQL Server instance collation along with the collation of all databases and columns.The objective is to ensure the standard collation is used throughout the entire SQL Server instance.See part 1 and part 2 of this series for more information on selecting a standard collation and planning such a collation change.

Be aware that a complete collation change is not unlike that of a major version upgrade, except tools to facilitate the change are limitted.You'll need to build new system databases, change user databases and change every character column to conform to the new collation.These collation changes can be done using either a side-by-side migration technique or performed in-place.

Changing the Instance Collation

The SQL Server setup REBUILDDATABASE option (see Books Online) is used to create new system databases for an existing instance with the desired collation.One advantage of using REBUILDDATABASE over a complete reinstall is that post-RTM service packs and patches don't need to be reapplied afterward.However, all server level objects like logins, linked servers, jobs, etc. need to be recreated after the rebuild so you'll need to script those out beforehand.User databases and columns will need to be changed separately, which I'll discuss in more detail later.

You can also perform a fresh SQL Server install on another instance for a side-by-side migration.One of the advantages of this side-by-side migration technique is that fallback is fast and relatively easy.The side-by-side migration method is attractive if you plan a server hardware and/or SQL version upgrade anyway.However, like the REBUILDDATABASE, you will need to create server-level objects after the install.

Changing User Database Collation

Before I get into the details of a database collation change, please vote on Connect feedback item Make it easy to change collation on a database.Until such a feature us available, we will endure the pain of performing this task manually.

Assuming you have performed due diligence and remediation beforehand (see my collation change planning article), changing the database collation in-place is relatively easy.A simple ALTER DATABASE will change the collation of all user database system objects as well as the database default collation:

ALTERDATABASE Foo

COLLATE Latin1_General_CI_AS;

But note that this database collation change does not actually change the collation of existing user table columns.Columns that do not match the database collation must be changed individually to conform, which is why a mass collation change is such a PITA.You might choose to rebuild the database using a side-by-side method so that both the database and column collations can be changed during the rebuild process.I generally recommend such a side--by-side method unless you are constrained by storage space.

Changing Column Collation Using ATLER TABLE...ALTER COLUMN

The syntax for changing a column collation is simple; just execute ALTER TABLE...ALTER COLUMN using the same column definition except for new column collation:

ALTERTABLE dbo.Foo ALTERCOLUMN

Bar varchar(50)COLLATE Latin1_General_CI_AS NOTNULL;

The above DDL method appears simple at first glance but there are many caveats that make this method problematic, especially when it must be repeated for many tables, large databases and/or a code page change is involved.ALTER TABLE...ALTER COLUMN may be acceptable for a isolated change but not necessarily for a mass one.The major issues are:

·Each column must be changed individually

You'll need a separate ALTER COLUMN statement for each character column in the database.A T-SQL script that generates the needed DDL using the catalog views is a must.See Louis Davidson's Change table collations en masse article for an example and be aware that text columns are problematic.

·Column references must be dropped

The altered column cannot be referenced by a constraint, index, statistic, computed column or schemabound object.This means that all of these references must be dropped before the column is altered and recreated afterward.

·Data are updated with a code page change

ALTER TABLE...ALTER COLUMN is a always a fast metadata-only change with a Unicode column.The operation is also a metadata-only change for a non-Unicode column, but only if the old and new collations have the same code page/character set.

When the old and new collations have a different code page/character set, then every row must be updated when a non-Unicode column is changed.The performance ramifications of such an update are huge, especially with large tables.A full table scan is required for each ALTER statement and every row in the table will be updated.Also, since SQL Server internally drops the old column and adds a new one, the internal row size increases considerably.Be aware that space requirements for modified non-Unicode columns will more than double until the clustered index is (re)built.To reclaim the space of a heap, you'll need to create and drop a clustered index. Keep in mind that the ALTER operation is fully logged regardless of the database recovery model so you need to plan log space requirements accordingly.

Because of these considerations, I do not recommend using ALTER TABLE...ALTER COLUMN for a mass collation change, especially when non-Unicode columns are involved and the code page/character set of the collations are different.Instead, migrate data to a new table with columns of the desired collation.

Changing Column Collation Using a New Table

If you cannot perform a side-by-side migration of the entire database using a side-by-side method due to storage constraints, an alternative to ALTER TABLE...ALTER COLUMN is to create a new table with the desired collation and then copy data from the original table.I also recommend this method over ALTER TABLE...ALTER COLOMN when migrating to a different code page/character set for the reasons I previously mentioned.

3.Drop all non-clustered indexes to free up disk space for the migration

4.For each table:

oCreate a new table exactly like the original, except with a different name and new collation for all character columns

oCreate the clustered index and check constraints

oLoad data

·Use INSERT...SELECT to load the new table.Be sure to specify a TABLOCKX hint on the INSERT so that the operation is minimally logged.If the table has an identity column. be sure to SET IDENTITY_INSERT...ON to retain the existing identity values.

oDrop the old table after successful copy and rename new table to old name

Summary

I cannot overstate the importance of choosing the right collation during the initial install since it is difficult to change after the fact.Unfortunately, we often inherit instances and databases of varying collations and must evaluate the effort of the collation change against the benefits of a consistent collation.If you are considering a collation change, be sure to test beforehand to avoid surprises during and after the migration and have a solid fallback plan.

In my last post, I discussed why one should avoid a mixed collation environment and how to choose the right collation for your environment.This post focuses on planning a collation change.

Should You Change Existing Collations?

Once you choose a standard collation (or at least a preferred one) for your organization, you'll need to decide if the change to existing instances, databases and columns is worth the effort and risk.Keep in mind that the effort involves not only the actual collation change but also testing along with possible changes to code and data to maintain the desired behavior.Such a remediation project can be quite significant depending on the old/new collation and scope of the change so you need to weigh the pros and cons to determine if the effort is justified.

Note that changing collations need not be an all-or-none decision; you might choose to convert only some (or none) of your existing instances/databases while enforcing the collation standard for new installations. You can identify the instances that are causing the most grief and weigh those accordingly.

A number of factors influence the effort and risk of a collation change.A change to language, sensitivity and/or code page is often more complex than a conversion from a SQL collation to a Windows collation (or Windows to SQL) of the same language and sensitivity.Let me discuss these scenarios in more detail so that you can better ascertain the effort and risk involved in your environment for planning purposes.

Windows vs. SQL Collation Change

A conversion between a SQL and Windows collation of the same language, sensitivity and code page ought to be fairly straightforward due to the same character set and similar comparison rules.As with any collation change, there are differences in behavior though.The main difference here is that Windows collations use word sort behavior so slightly different sorting/comparison behavior will result.The script below shows such a difference with identical data

All things being equal, a conversion from/to a Windows collation will likely require few changes, if any, to code and schema (besides the collation change).On the other hand, converting to a collation of different sensitivity and/or character set is often be more challenging

Sensitivity Change

You might recall that the instance collation determines the sensitivity for variable names and labels while the database collation determines sensitivity of identifiers and literals.I always match characters exactly in variable names, labels identifiers (including table aliases) regardless of whether I'm using a sensitive or insensitive collation and never use names that differ only by case.Not only does naming consistency make code cleaner, this practice facilitates moving between collations.However, it is unlikely that all database developers were so anal in their naming so be aware that you'll probably need to make code or schema changes in order to convert between collations of different sensitivity.

A change from a case-sensitive collation to a case-insensitive one is usually minor, at least from a code perspective.The same schema/code that runs in a case-sensitive environment will run in a case-insensitive collation as long as you don't encounter names and identifiers in the same scope that differ only by case (e.g. @customerID and @CustomerID).Such a deliberate practice is uncommon in my experience but these conflicts must be addressed before changing to a case-insensitive collation.

One usually strives to store and query data using a consistent case (especially all upper/lower) under a case-sensitive collation.If this practice was not followed, data that was unique under a case-sensitive collation will not be regarded as such under case-insensitive rules and prevent unique indexes (including primary key or unique constraints) from begin created.This might actually be a good thing when the real issue is bad data (i.e. duplicates inadvertently allowed due to inconsistent case).However, you may need to deviate from the case-insensitive standard at the column level in some situations due to business requirements, such as to enforce uniqueness of case-sensitive part numbers.

Going from a case-insensitive to a case-sensitive or binary collation (which I don't personally recommend) will typically require more changes.Developers tend to be a bit sloppy with matching case under a case-insensitive collation because there is no requirement to do so.Don't be surprised if a lot of code and queries must be changed once variables and identifiers become case sensitive.Furthermore, you may need to update data to a consistent case and also make application changes to ensure data are stored in a consistent case.

The considerations that apply to case sensitivity also apply to other collation sensitivity options (accent, Kana and width).I wouldn't expect as many issues compared to a change in case sensitivity in most cases, though.

Character Set Change

A change in code page is a non-issue when char/varchar/text data contains only ASCII characters.If you have a character outside the ASCII range (0-127, 0x00-0x7F), a code page change will present a problem when the character doesn't also exist in the target collation's code page.Such a character will instead be mapped to an alternate character (e.g. 'À' to 'A' in example below) or the catch-all '?' (e.g. '€' to '?' in example below).If this mapping is unacceptable, you'll need to change the data type to Unicode (nchar/nvarchar/ntext) or update data to conform to the target code page.

If you are unsure if you have problem characters, the above script shows one method to identify these.This script converts the original collation characters to Unicode and then to varbinary and repeats the technique for the target collation.An inequality of the two values indicates an inexact mapping that may require remediation.

Language Change

I'm sure some of you have inherited different language collations due to mergers and acquisitions or inattention to detail during installation.Be mindful that the topic of supporting multiple languages/locales is much larger than just collation.I'm only discussing a collation language change here but if you need to fully support multiple languages in a single database, you must also consider other factors such as a schema that supports multiple translations, currency and UOM conversion and applications that are sensitive to client locale.

You may experience different behavior after a collation language change due to the different sorting and comparison semantics.The script below illustrates such a difference.Even if you chose a collation that supports the majority of your users' languages, that collation might be less than ideal for the user minority.Consider performing some operations in application code instead of SQL Server when the standard collation behavior is unacceptable for the task at hand.

--returns both 'Schröder' and 'Schroeder'

DECLARE @Foo TABLE(

LastName nvarchar(10)COLLATE German_PhoneBook_CI_AS);

INSERTINTO @Foo VALUES(N'Schröder');

INSERTINTO @Foo VALUES(N'Schroeder');

SELECT LastName FROM @Foo

WHERE LastName LIKEN'%oe%';

GO

--returns only 'Schroeder'

DECLARE @Foo TABLE(

LastName nvarchar(10)COLLATE Latin1_General_CI_AS);

INSERTINTO @Foo VALUES(N'Schröder');

INSERTINTO @Foo VALUES(N'Schroeder');

SELECT LastName FROM @Foo

WHERE LastName LIKEN'%oe%';

GO

Summary

A collation change effort varies considerably depending on the size and complexity of the environment.Perform due diligence before embarking on a collation change.I don't want to discourage anyone from changing collations but as much as a mixed collation environment is a pain, a botched remediation project is even worse.Be sure to plan accordingly.

I'll share different methods to change collations in my last post of this series.

I inherited a mixed collation environment with more collations than I can count on one hand.The different collations require workarounds to avoid "cannot resolve collation conflict" errors and those workarounds kill performance due to non-sargable expressions.Dealing with mixed collations is a real pain so I strongly recommend you standardize on a single collation and deviate only after careful forethought.Here's a brief overview of collations and some guidance to help you choose the right collation for your organization and new SQL installations.

A collation determines which characters can be stored in non-Unicode character data types and the bit patterns used for storage.Char, varchar and text data types can store only 256 different characters due to the single byte limitation.The first 128 characters (0-127, 0x00-0x7F) are the same for all collations as defined by the ASCII character set and the remaining 128 characters (128-255, 0x80-0xFF) vary according to the code page associated with the collation.Characters without an associated code point are mapped to an either an alternate character or to the catch-all '?' character.

Collations are grouped into Windows and SQL collations.Windows collations provide sorting and comparison behavior consistent with applications running on a computer with the corresponding Windows operating system locale. Windows collation also provide consistent behavior for both Unicode and non-Unicode data types.

SQL collations use different rules for non-Unicode and Unicode types.SQL Server collations, identified with the SQL_ collation name prefix, use the character set and sort order settings from older SQL Server versions for non-Unicode types and are provided specifically to maintain compatibility with existing SQL Server installations.Both SQL and Windows collations use the same rules for Unicode types.

Specifying a Collation

Collation can be specified at the instance, database, column and expression level.The SQL Server instance collation is determined during SQL Server installation and cannot be changed without a reinstall/rebuild.It's a good idea to get the collation right the first time unless you need practice re-installing SQL Server.Keep in mind that the instance collation determines the collation (including case-sensitivity) of Instance-level objects like logins and database names as well as identifiers for variables, GOTO labels and temporary tables.Passwords are always case-sensitive in SQL 2005 and above, although collation determined password sensitivity in earlier versions.

The database collation is determined when the database is created.If not specified otherwise, the instance default collation is used as the database collation. Database-level identifiers like table and column names use the database collation as do literal expressions.The database collation can be changed at any time but this does not change the collation of existing table columns.

Column collation for character data is specified when the table is created or when the column added to the table.If not specified otherwise, the database collation is used.A column's collation can be changed only by altering the column with the new collation or recreating the table with the new collation specified on the column definition.If you want a column's collation to remain different than the database default collation, you must be careful to explicitly specify the collation whenever the column is altered so that it not inadvertently changed to the database default collation.

Choosing the Right Collation

The default collation that the SQL Server installer chooses is not necessarily the Microsoft recommended one or the one that is best for your environment.SQL Server setup examines the operating system locale and chooses the default as the oldest available version associated with the locale.For example, a SQL Server installation in the US will default to SQL_Latin1_General_CP1_CI_AS and the installation default in the UK will be Latin1_General_CI_AS.In both cases, Microsoft recommends a Windows collation (e.g. Windows Latin1_General_CI_AS) unless one needs to maintain compatibility with existing installations.More on that shortly.

Language is the most important consideration in choosing a collation for a new installation.This is one reason why the SQL Server installer chooses the default collation based on the operating system locale.If all users speak the same language, choose a collation that supports the language/locale.This will help ensure expected sorting and comparison behavior along with alphabet support for non-Unicode types.In a multi-language environment, choose a collation with the best overall support for the languages used.

Another major consideration is collation compatibility.If you have existing SQL installations, consider using the same collation for a new instance if you envision sharing data via replication, SSIS or future server consolidation.I previously mentioned that Microsoft recommends a Windows collation but it may be better to revert to a SQL collation for compatibility with older instances in your environment that already use the SQL collation.Compatibility is another reason why the installation default is SQL_Latin1_General_CP1_CI_AS collation in the US.Unfortunately, this default has the side effect of DBAs unwittingly installing new instances with a SQL collation instead of a Windows collation like Latin1_General_CI_AS even when compatibility isn't needed.

The choice of whether or not to choose a case sensitive collation is a bit subjective.A case insensitive collation is appropriate when you need to query data regardless of the case of the actual data.For example, this allows one to easily find customers with a last name of 'Smith' even when data is not stored in proper case.With a case sensitive collation, it is important that one stores data in a consistent case (not to say that one shouldn't anyway) and this places more burden on application and database developers.

Collation Performance

Collation performance was a bigger deal back in the days of 486 processors (instead of collation, it was actually character set and sort order back then).The comparative performance on modern processors is usually insignificant.SQL collations should provide better performance than Windows collations for non-Unicode types due to simpler comparison rules but the difference is significant only in the most severe circumstances, such as a table scan with LIKE '%Some String%' in the WHERE clause.See Comparing SQL collations to Windows collations.Binary collations are said to provide the best performance but the cost of unnatural (non-dictionary) comparisons and sort order is high; most users would expect 'a' to sort before 'B' but that is not the case with binary collations.

I personally don't think performance should even be considered in choosing the proper collation.One of the reasons I'm living in collation hell is that my predecessors chose binary collations to eke out every bit of performance for our highly transactional OLTP systems.With the sole exception of a leading wildcard table scan search, I've found no measurable performance difference with our different collations.The real key to performance is query and index tuning rather than collation.If performance is important to you, I recommend you perform a performance test with your actual application queries before you choose a collation on based on performance expectations.

Summary

My general recommendation is that you should use a case insensitive Windows collation appropriate for your locale unless you need to maintain compatibility with existing SQL instances or have special considerations.In my next post, I'll discuss changing collations so that you can avoid a mixed collation environment and show different methods to accomplish the task.

I never had the need to turn on the PARAMETERIZATION FORCED database option until this week.We pretty much use only stored procedures for our internal applications so the execution plans are almost always in cache and reused.This practice of using parameterized stored procedure calls, together with attention to detail in query and index tuning, allows us to comfortably handle several thousand requests per second on commodity hardware without taking special measures.

The Perfect Storm

We acquired a third-party application which had to sustain thousands of batch requests per second in order to keep up with our peak demand.Our first attempt to use the application out-of-the box failed miserably when the 16-core database server quickly hit 100% CPU and stayed there.An examination of the most frequently run query soon revealed why CPU was so high.Not only was the moderately complex query not parameterized, each invocation required a full table scan. The schema (EAV model, missing primary keys and indexes), application code (ad-hoc, non-parameterized queries) and inattention to indexing seemed the perfect storm to guarantee failure.

Our hands were tied in what the vendor could/would do to address our performance concerns.We worked with the vendor to optimize indexes and this brought the CPU down to about 65% but the batch requests/sec rate and slow response time was still unacceptable. We needed to increase performance by at least an order of magnitude to meet SLAs.

The Perfect Fix

CPU was 95%+ at peak time (several thousand batch requests/second, via an ASP (classic) front end), and the peak time lasted 8+ hours every day. The server was one of the big HP boxes -- not sure if it was a Superdome or some other model -- with something like 56 cores and 384 GB of RAM. The database itself was only 40 or 50 GB, as I recall, so the entire thing was cached. Long story short, I logged in during peak load, did a quick trace and noticed right away that none of the queries were parameterized. I decided to throw caution to the wind and just go for it. Flipped the thing into Forced Parameterization mode and held my breath as I watched the CPU counters *instantly* drop to 7% and stay there. I thought I'd broken the thing, but after checking my trace queries were running through the system same as before, and with the same number of errors (another story entirely <g>). Luckily the head IT guy happened to be watching his dashboard right as I made the change, and after seeing such an extreme result thought I was a god...

I knew of PARAMETERIZATION FORCED but never realized how big a difference the option could make until I learned of Adam's experience.I'm not quite as adventuresome as he is so I restored the production database to a separate environment for some cursory testing.To my amazement, I watched the rate of my single-threaded test jump from a few dozen batch requests/sec to several hundred immediately after I executed "ALTER DATABASE...SET PARAMETERIZATION FORCED".CPU dropped by half even with the tenfold increase in throughput.

The production improvement was even more impressive - the 16 core Dell R900 hasn't exceeded 8% CPU since the change.Response time is excellent, we have happy users and plenty of CPU headroom to spare.

A Turbo Button?

Despite anecdotal success with PARAMETERIZATION FORCED, I wouldn't turn it on indiscriminately.When the PARAMETERIZATION FORCED database option is on, all queries are parameterized, including complex ones.This is good in that compilation costs are avoided due to cache hits.The bad news is that a single plan might not be appropriate for all possible values of a given query.Worse overall performance will result when higher execution costs (due to sub-optimal plans) exceed compilation savings so you should understand the query mix before considering the option.

In contrast, SQL Server parameterizes only relatively simple "no brainer" queries in the default PARAMETERIZATION SIMPLE mode.This behavior promotes reuse of plans for queries that will yield the same plan anyway regardless of the literal values in the query.Complex queries are not parameterized automatically so that the optimizer can generate the optimal plan for the values of the current query in the event of a cache miss. The downside with simple parameterization, as Adam and I observed, is that complex queries not already in cache will incur costly compilation costs that are a CPU hog in a high-volume OLTP workload.

There is also middle ground between PARAMETERIZATION SIMPLE and PARAMETERIZATION FORCED.One can use plans guides with PARAMETERIZATION SIMPLE to avoid compilation for selected queries while other complex queries are compiled as normal.In my case, a plan guide may have been a better option because the culprit was a single query rather than many different unpredictable ones.

In my opinion, the best solution is to use stored procedures and/or parameterized queries in the first place.These methods provide the performance benefits of PARAMETERIZATION FORCED and add other security and application development benefits.Unfortunately, third-party vendors are notorious for not following parameterization Best Practices so DBAs need to keep PARAMETERIZATION FORCED and plan guides in their tool belt.

A user in the SQL Server public newsgroups asked about how to restore a database with many files and rename during the process:

I am restoring a database onto another server with different drive
sizes and mappings.
The thing is, I have over 100 catalogs to restore. I don't want to
have to define each catalog name and its new location Like below:

RESTORE DATABASE Northwinds
FROM DISK = 'C:\db.bak'
WITH MOVE 'Catalog1' TO 'D:\Catalog1'
WITH MOVE 'Catalog2' TO 'D:\Catalog2
WITH MOVE 'Catalog3' TO 'D:\Catalog3'
WITH MOVE 'Catalog4' TO 'D:\Catalog4
WITH MOVE 'Catalog5' TO 'D:\Catalog5'
WITH MOVE 'Catalog6' TO 'D:\Catalog6'
...WITH MOVE 'Catalog100' TO 'D:\Catalog100'

This reminded me of a stored procedure I wrote several years ago for SQL Server 2000 that would be perfect for such a task.The proc generates and optionally executes the necessary RESTORE and ALTER commands to make quick work of what is otherwise a long and tedious process if you have many files and databases.I updated my old proc for SQL Server 2008 and thought I'd share it here. Below is the proc with documentation and samples in the comments. I hope you find this useful.

IFOBJECT_ID(N'tempdb..#RestoreDatabase_SQL2008')ISNOTNULL

DROPPROCEDURE #RestoreDatabase_SQL2008

GO

CREATEPROCEDURE #RestoreDatabase_SQL2008

@BackupFile nvarchar(260),

@NewDatabaseName sysname=NULL,

@FileNumber int= 1,

@DataFolder nvarchar(260)=NULL,

@LogFolder nvarchar(260)=NULL,

@ExecuteRestoreImmediately char(1)='N',

@ChangePhysicalFileNames char(1)='Y',

@ChangeLogicalNames char(1)='Y',

@DatabaseOwner sysname=NULL,

@AdditionalOptions nvarchar(500)=NULL

AS

/*

This procedure will generate and optionally execute a RESTORE DATABASE

I recently had to setup Database Mail on dozens of SQL Server instances.Rather than perform this tedious task using the SSMS GUI, I developed a script that saved me a lot of time which I'm sharing here.

My needs were simple so I only needed a single SMTP account and profile.I decided to make the profile the default public one so that all msdb users would use this profile unless a different sp_send_dbmail @profile value was explicitly specified. You might want to extend this script if you need other accounts/profiles, such as separate ones for administrative alerts or user reports.

Setup Script

Below is the template script I used for my task.The sysmail_add_account_sp @username and @password parameters might be required depending on your SMTP server authentication and you will of course need to customize the mail server name and addresses for your environment.

In case you haven't yet heard, Microsoft SQL Server 2008 service pack 1 was released on April 7.This milestone is especially significant for those of you who could not previously deploy the latest SQL Server release because your organization has a "not before the first service pack" policy.I want to go on record as one who believes that such a policy is flawed and has needlessly delayed many organizations from using the new SQL Server 2008 features.

There is nothing magical about the first service pack compared to the initial RTM release with regards to production readiness.SQL Server releases nowadays are scheduled based quality rather than just hitting a date.Buggy features will be dropped from a release rather than included and in need of a service pack.I'm not saying that every SQL Server release is flawless but the number of serious bugs (e.g. corruption or wrong results) are few, thanks to internal testing by Microsoft as well as those in the community that kick the tires with the pre-release CTP bits.

It's understandable that those who are risk-adverse might wait until after the first service pack with the belief that other adopters may have smoothed out the bumps in the road a bit.I can see how postponing installation in this way might mitigate some of the risk but SP1 is a completely arbitrary milestone that is a hold-over from before SQL 7 was released over a decade ago.I think a better approach is to adopt new releases based on quality as determined in one's own environment.Whether the target is a new SQL Server installation or an upgrade of an existing instance, one still needs to perform testing before installing any new version, service pack or patch in production.It is those test results that should determine production readiness, not the results of SELECT SERVERPROPERTY('ProductLevel').

I suggest that one always turn on both the QUOTED_IDENTIFIERS and ANSI_NULLS session settings.Not only do these settings provide ANSI-standard behavior, these must be turned on in order to use features like indexed views, indexes on computed columns and query notifications.It is tricky to ensure the settings are as desired, though, because the default session settings are different depending on the tools you use.

DDL Script Considerations

It is especially important to ensure the QUOTED_IDENTIFIERS and ANSI_NULLS session settings are correct with DDL scripts because both QUOTED_IDENTIFIERS and ANSI_NULL are "sticky".The settings in effect when a stored procedure, view, function or trigger are created are also used at execution time.The create time settings override run-time session settings.

SQLCMD and OSQL Turn Settings Off

QUOTED_IDENTIFIERS and ANSI_NULLS are on by default when you connect using modern client APIs like ODBC, SQLOLEDB, SQL Native Client and SqlClient.The SQL Server Management Studio and Query Analyzer tools keep those settings on unless you override the connection behavior under the tool connection options or run SET QUOTED_IDENTIFIERS ON or SET ANSI_NULLS ON commands in the query window.

The SQLCMD and OSQL command prompt utilities are different, tough.These tools explicitly turn off QUOTED_IDENTIFIERS after connecting, presumably to provide backwards compatibility.One must either specify the “-I” (upper-case “eye”) command-line argument to turn on QUOTED_IDENTIFIERS or include a SET QUOTED_IDENTIFIERS ON command in all the SQL scripts run from those utilities.I personally like avoid SET commands in my DDL scripts so I make it a habit to specify the -I command line option.

I mentioned in Conditional INSERT/UPDATE Race Condition that most “UPSERT” code is defective and can lead to constraint violations and data integrity issues in a multi-user environment .In this post, I’ll show how to prevent duplicate key errors and data problems with the MERGE statement too.You might want to peruse Conditional INSERT/UPDATE Race Condition before reading this for a background on these concurrency concerns.

Background on MERGE

Microsoft introduced the ANSI-standard MERGE statement in SQL Server 2008.MERGE is very powerful in that it can perform multiple actions in a single statement that previously required separate INSERT/UPDATE/DELETE statements.MERGE is also a good alternative to the proprietary UPDATE…FROM syntax allowed in the T-SQL dialect.

MERGE can (and in my opinion should) be used to address the requirement to either INSERT or UPDATE depending on whether the source data already exists.One need only include the MERGE statement clauses WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT in order to take the proper action, all within a single statement.

“UPSERT” MERGE Concurrency Test

Even though MERGE provides the means to perform multiple actions within a single statement, developers still need to consider concurrency with MERGE to prevent errors and data issues.Let me illustrate using the table and stored procedure that I originally posted in Conditional INSERT/UPDATE Race Condition:

CREATETABLE dbo.Foo

(

ID intNOTNULL

CONSTRAINT PK_Foo PRIMARYKEY,

Bar intNOTNULL

);

GO

CREATEPROCEDURE dbo.Merge_Foo

@ID int,

@Bar int

AS

SETNOCOUNT,XACT_ABORTON;

MERGE dbo.Foo AS f

USING (SELECT @ID AS ID, @Bar AS Bar)AS new_foo

ON f.ID = new_foo.ID

WHENMATCHEDTHEN

UPDATESET f.Bar = new_foo.Bar

WHENNOTMATCHEDTHEN

INSERT (ID, Bar)

VALUES (new_foo.ID, new_foo.Bar);

RETURN@@ERROR;

GO

I ran the script below from 2 different SSMS windows after changing the time to the near future so that both executed at the same time.My test box had a single quad-core processor with SQL Server 2008 Developer Edition installed, which I expected to have enough multi-processing power to create the error.

WAITFOR TIME '08:00:00'

EXEC dbo.Merge_Foo

@ID = 1,

@Bar = 1

I got a primary key violation error, showing that MERGE is vulnerable to concurrency problems like a multi-statement conditional INSERT/UPDATE technique. However, I couldn’t reproduce the error with MERGE nearly as consistently as I could with the conditional INSERT/UPDATE in Conditional INSERT/UPDATE Race Condition.This could be due to a number of reasons (e.g. faster processor, different SQL Server version, MERGE locking behavior) but I wanted to make sure I could reproduce the error reliably.I created a more robust test to exercise MERGE on a loop:

CREATETABLE dbo.Foo2

(

ID intNOTNULL

CONSTRAINT PK_Foo2 PRIMARYKEY,

InsertSpid intNOTNULL,

InsertTime datetime2NOTNULL,

UpdateSpid intNULL,

UpdateTime datetime2NULL

);

CREATEPROCEDURE dbo.Merge_Foo2

@ID int

AS

SETNOCOUNT,XACT_ABORTON;

MERGE dbo.Foo2 AS f

USING (SELECT @ID AS ID)AS new_foo

ON f.ID = new_foo.ID

WHENMATCHEDTHEN

UPDATE

SET f.UpdateSpid =@@SPID,

UpdateTime =SYSDATETIME()

WHENNOTMATCHEDTHEN

INSERT

(

ID,

InsertSpid,

InsertTime

)

VALUES

(

new_foo.ID,

@@SPID,

SYSDATETIME()

);

RETURN@@ERROR;

I ran the script below from 4 different SSMS windows after changing the time to the near future so that all executed at the same time.

DECLARE

@NextTime datetime,

@ID int,

@MillisecondDelay int;

SELECT

@NextTime ='08:10:00',

@ID = 1,

@MillisecondDelay = 100;

--execute 10 times per second for 1 minute

WHILE @ID <= 600

BEGIN

--pause and sync with other sessions

WAITFORTIME @NextTime;

EXEC dbo.Merge_Foo2

@ID = @ID;

SELECT

@ID = @ID + 1,

--assume no more that 100ms per execution

@NextTime =DATEADD(MILLISECOND, @MillisecondDelay, @NextTime);

END;

I was able to reproduce the primary key violation every time with this test script.

Addressing the MERGE Race Condition

The underlying issue with any conditional insert technique is that data must be read before the determination can be made whether to INSERT or UPDATE.To prevent concurrent sessions from inserting data with the same key, an incompatible lock must be acquired to ensure only one session can read the key and that lock must be held until the transaction completes.

I showed how one might address the problem using both UPDLOCK and HOLDLOCK locking hints in Conditional INSERT/UPDATE Race Condition.MERGE is slightly different, though.I repeated the test with only the HOLDLOCK hint added:

ALTERPROCEDURE dbo.Merge_Foo2

@ID int

AS

SETNOCOUNT,XACT_ABORTON;

MERGE dbo.Foo2 WITH (HOLDLOCK)AS f

USING (SELECT @ID AS ID)AS new_foo

ON f.ID = new_foo.ID

WHENMATCHEDTHEN

UPDATE

SET f.UpdateSpid =@@SPID,

UpdateTime =SYSDATETIME()

WHENNOTMATCHEDTHEN

INSERT

(

ID,

InsertSpid,

InsertTime

)

VALUES

(

new_foo.ID,

@@SPID,

SYSDATETIME()

);

RETURN@@ERROR;

This test showed that simply adding the HOLDLOCK hint prevented the primary key violation error.Unlike the conditional INSERT/UPDATE in Conditional INSERT/UPDATE Race Condition, MERGE acquired a key update lock by default so UPDLOCK was not needed.Also, in contrast the multi-statement conditional INSERT/UPDATE technique, no explicit transaction is required because MERGE is an atomic DML statement.The HOLDLOCK hint was still needed, though, because MERGE otherwise releases the update key lock before the insert.I gleaned this by examining the locks from a Profiler trace of the MERGE without the HOLDLOCK:

EventClass

TextData

Mode

ObjectID

Type

SP:Starting

EXEC dbo.Merge_Foo2 @ID = 1

1314103722

Lock:Acquired

8 - IX

1330103779

5 - OBJECT

Lock:Acquired

1:173

7 - IU

0

6 - PAGE

Lock:Acquired

(10086470766)

4 - U

0

7 - KEY

Lock:Released

(10086470766)

4 - U

0

7 - KEY

Lock:Released

1:173

7 - IU

0

6 - PAGE

Lock:Acquired

1:173

8 - IX

0

6 - PAGE

Lock:Acquired

(10086470766)

15 - RangeI-N

0

7 - KEY

Lock:Acquired

(10086470766)

5 - X

0

7 - KEY

Lock:Released

(10086470766)

5 - X

0

7 - KEY

Lock:Released

1:173

8 - IX

0

6 - PAGE

Lock:Released

8 - IX

1330103779

5 - OBJECT

SP:Completed

EXEC dbo.Merge_Foo2 @ID = 1

1314103722

If another concurrent MERGE of the same key occurs after the update lock is released and before the exclusive key lock is acquired, a duplicate key error will result.

The trace below of the MERGE with the HOLDLOCK hint shows that locks aren’t released until the insert (and statement) completes, this avoiding the concurrency problem with MERGE.

I’ve used sp_detach_db and sp_attach_db to relocate database files for many years.I know that sp_attach_db was deprecated in SQL 2005 but, like most DBAs, I’ve continued to use sp_attach_db mostly out of habit.I want to share with you why I’ve decided to change my ways.

Planned File Relocation

Let’s say you want to move the log file from to a separate drive.The following script shows how to accomplish in SQL Server 2000 using sp_attach_db.The only sp_attach_db parameters required are the database name, primary data file path and the log file that was moved from the original location.

EXECsp_detach_db

@dbname = N'MyDatabase';

--move log file to E drive manually and attach from new location

EXECsp_attach_db

@dbname = N'MyDatabase',

@filename1 = N'D:\DataFiles\MyDatabase_Data.mdf',

@filename2 = N'E:\LogFiles\MyDatabase_Log.ldf';

The deprecated sp_attach_db procedure still works in SQL Server 2005 and SQL Server 2008 but is not recommended.Instead, the proper method to relocate files in these later versions is with ALTER DATABASE…MODIFY FILE.Simply execute an ALTER DATABASE…MODIFY FILE for each moved file and toggle the ONLINE/OFFLINE database state.The script example below shows how the log file would be moved to a different drive with this method.This method is described in detail in the Books Online.

ALTERDATABASE MyDatabase SETOFFLINE;

--move log file to E drive manually and attach from new location

ALTERDATABASE MyDatabase

MODIFYFILE(

NAME='MyDatabase_Log',

FILENAME='E:\LogFiles\MyDatabase_Log.ldf');

ALTERDATABASE MyDatabase SETONLINE;

Unfortunately, the Books Online doesn’t provide much info as to why ALTER DATABASE…MODIFY FILE and ONLINE/OFFLINE is preferred over detach/attach for planned file relocations.One explanation is illustrated by an issue I ran into recently that motivated this post.After using the detach/attach method, we ended up with Service Broker disabled.This is documented behavior that we simply overlooked and didn’t catch until subsequent application problems were reported.Since exclusive database access was needed to re-enable Service Broker, we had to close all user database connections and before altering the database ENABLE_BROKER setting and this was a real pain.

This problem wouldn’t have happened had we used the recommended method and toggled the OFFLINE/ONLINE database state because the database settings would have remained unchanged.I wouldn’t be surprised if there were other gotchas with the detach/attach method.The bottom line is that there is no reason not to use the ALTER DATABASE…MODIFY FILE and OFFLINE/ONLINE method to move files.

Attaching a Database to Another Server or Instance

Note that the deprecated sp_attach_db stored procedure is basically just a wrapper for CREATE DATABASE…FOR ATTACH.You can use CREATE DATABASE…FOR ATTACH much like you would sp_attach_db: specify the database name, primary file path (mdf file path) along with file paths that differ from the original locations.For example:

EXECsp_detach_db

@dbname = N’MyDatabase’;

--move database files manually to new server

CREATEDATABASE MyDatabase

ON(NAME=’MyDatabase_Data’,

FILENAME='C:\DataFiles\MyDatabase_Data.mdf')

LOGON(NAME='MyDatabase_Log',

FILENAME='C:\LogFiles\MyDatabase_Log.ldf')

FORATTACH

WITHENABLE_BROKER;

The ENABLE_BROKER option is appropriate if the purpose of the attach is to completely move a SB-enabled database to another instance or in a DR scenario.When attaching to create a database replica (e.g. copy for testing), the NEW_BROKER option is appropriate.

Summary

I suggest ALTER DATABASE…MODIFY FILE and OFFLINE/ONLINE for planned file relocation and sp_detach_db/CREATE DATABASE…FOR ATTACH for other scenarios.In any case, sp_attach_db should be avoided going forward.

I welcome any feedback you might have, either here or via the Codeplex project discussion page.It’s been a while since I’ve done any Reporting Services development so there is certainly room for improvement.Additional project team members are also welcome.

You will likely find the following query useful if you work with partitioned objects.I developed this when I first started using table partitioning in order to verify proper partition boundaries, filegroups and row counts.Not only does this provide much more information that can be obtained by querying the underlying table with the partition function to get partition numbers, it runs much faster because only catalog views are used.

As-is, the query includes both partitioned and non-partitioned user objects in the context database but you can customize the WHERE clauses as desired.I think this query would make a perfect source for a SSMS custom report so that it can be easily invoked from the SSMS Object explorer.That’s on my to-do list.

I posted example scripts to automate RANGE LEFT sliding window maintenance in my last post.As promised, I am sharing a RANGE RIGHT version in this post.

I personally prefer a RANGE RIGHT partition function when partitioning on a data type that includes time.RANGE RIGHT allows specification of an exact date boundary instead of the maximum date/time value needed for RANGE LEFT to keep all data for a given date in the same partition.Another nicety with RANGE RIGHT is that same boundaries can be used in a RANGE RIGHT partition function of any date/time data type.In contrast, the time component of RANGE LEFT boundary values must be customized for the specific data type as I described in Sliding Window Table Partitioning.

The downside with RANGE RIGHT is that maintaining the sliding window isn’t quite as intuitive as with RANGE LEFT.Instead of switching out and merging the first partition during purge/archive, one needs to switch out and merge the second partition.This practice avoids the costly movement of retained data from the removed second partition into the retained first partition.Both the first and second partitions are empty during the merge so no data need to be move moved.The first partition is normally empty at all times.

The stored procedure below shows how you can automate a RANGE RIGHT daily sliding window.The main differences between this version and the RANGE LEFT version I posted in Automating Sliding Window Maintenance are