Leonard Lobel

Leonard Lobel (Microsoft MVP, Data Platform) is the chief technology officer and co-founder of Sleek Technologies, Inc., a New York-based development shop with an early adopter philosophy toward new technologies. He is also a principal consultant at Tallan, Inc., a Microsoft National Systems Integrator and Gold Competency Partner.

Programming since 1979, Lenni specializes in Microsoft-based solutions, with experience that spans a variety of business domains, including publishing, financial, wholesale/retail, health care, and e-commerce. Lenni has served as chief architect and lead developer for various organizations, ranging from small shops to high-profile clients. He is also a consultant, trainer, and frequent speaker at local usergroup meetings, VSLive, SQL PASS, and other industry conferences.

Lenni has also authored several MS Press books and Pluralsight courses on SQL Server programming

I want to…

I’ve got a juicy SQL Server post for you to ring in the new year – Transparent Data Encryption, available only in SQL Server 2008 Enterprise edition. This is good and powerful stuff that works, folks, so check it out here.

Sensitive information (passwords, credit card numbers, salary information, and so on) in your database needs to be encrypted. As of SQL Server 2005, you can encrypt and decrypt sensitive data columns in your tables using symmetric keys. You identify the columns that will hold sensitive information, and then invoke ENCRYPTBYKEY to store data in those columns, and DECRYPTBYKEY to retrieve data from those columns. The process is fairly straightforward, but it does require programming in your application for each encrypted column.

In SQL Server 2008 (Enterprise edition only), Microsoft has added a new feature called Transparent Data Encryption (TDE). This feature automatically encrypts the entire database (data and log files), as well as database backups, without requiring any programming or code changes to your application. The process is entirely transparent, hence the name Transparent Data Encryption. In this blog post, I’ll explain TDE, and demonstrate how to use this new security feature.

(Note that the NTFS file system in Windows Server 2000 and later provides a feature called Encrypted File System [EFS]. This feature also applies transparent encryption to any data stored on the hard drive, but it will not protect databases or backups that have been copied onto a CD or other media. TDE in SQL Server 2008 is based on a certificate that is needed to decrypt or restore any encrypted database, regardless of where the data is transferred.)

When TDE is first enabled for a specific database, SQL Server encrypts the database in the background. During this process, the database remains online and responsive to client requests (similarly, when encryption is disabled, SQL Server decrypts the database in the background). Encryption is performed at the page level, and does not increase the size of the database in any way. Once the entire database is encrypted, new data gets encrypted on the fly as it is written to disk, and all data gets decrypted when read back.

Multiple Protection Layers

Databases protected with TDE are encrypted with a Database Encryption Key (DEK). You create the DEK and store it in the database, but the DEK itself is associated with a certificate that you create separately in the master database. This means that a backup of the database includes the DEK, but doesn’t include the certificate upon which the DEK is based. Hence, TDE database backups are useless to prying eyes, since they cannot be restored without the certificate. Finally, the certificate itself is encrypted by the Service Master Key (SMK), also contained in the master database.

To get started, you’ll need to create an SMK, if your server doesn’t have one already. Then you can create a certificate for TDE that is encrypted by the SMK which can be used to create one encrypt one or more DEKs. Finally, you create a DEK against the certificate in each database to be encrypted and then enable encryption on the database.

The following diagram illustrates how TDE might be used to encrypt two databases on one server instance:

In this diagram, you can see that the master database holds the SMK (there can be one and only one SMK on any server instance). The master database also holds a certificate whose private key is encrypted by the SMK. The two databases MyDB1 and MyDB3 are each encrypted with DEKs that are, in turn, encrypted by the certificate. The DEKs are entirely dependent on the certificate, so copying or restoring these databases to another server instance without also transferring the certificate upon which the DEKs are based yields a totally unusable database.

Creating a Service Master Key (SMK)

If your server already has an SMK, you can skip this step. An SMK can only be created in the master database, and there can only be one SMK per server instance. If you don’t already have an SMK, you can create one as follows:

Creating a TDE Certificate

In general, certificates can be created in any database. However, certificates used for TDE must be created in the master database. It should be fairly obvious why the certificates used to encrypt DEKs in each encrypted database are stored outside the encrypted database in master (you wouldn’t want them stored in the encrypted database, as that would defeat the whole protection scheme!)

Creating a Database Encryption Key (DEK)

Each database to be encrypted requires its own DEK, and each database’s DEK is in turn encrypted by the TDE certificate we just created in the master database. When creating the DEK, you can specify a particular encryption algorithm to be used. Supported algorithms include AES_128, AES_192, AES_256, TRIPLE_DES_3KEY. The DEK protects not only the data and log files, but backups too. Attempting to restore an encrypted database without the certificate is an exercise in futility. The following T-SQL code creates a DEK for the MyDB database that specifies 128-bit encryption:

Notice the ENCRYPTION BY SERVER CERTIFICATE clause that references the TDE certificate MyEncryptionCert we just created in the master database. This means that the MyEncryptionCert certificate must be present and available in the master database of the same server instance as MyDB, or the database will be rendered inaccessible.

(Because we have not yet backed up the TDE certificate, SQL Server issues a warning at this time alerting you to the fact that the certificate being used to encrypt the DEK has not been backed up. This warning should be taken seriously, since you will not be able to access any database encrypted by the DEK without the certificate. Should the certificate be lost or damaged, your encrypted databases will be completely inaccessible. Later in this post, I will show you how to back up and restore the certificate.)

Enabling TDE

With the SMK, certificate, and DEK prepared, you can start transparent data encryption on the database using the ALTER DATABASE…SET ENCRYPTION ON statement. For example:

ALTER DATABASE MyDB SET ENCRYPTION ON

That’s all there is to it! From this point forward, the database and all of its backups will be encrypted. If an unauthorized party somehow gains access to the physical media holding any backups of MyDB, the backups will be useless without the certificate protecting the DEK.

Querying TDE Views

You can query the catalog view sys.databases to see which databases are protected by TDE. For example:

SELECT name, is_encrypted FROM sys.databases

The query results show that MyDB is the only encrypted database on the server:

This output is somewhat misleading, however, since encrypting one or more databases results in the encryption of tempdb as well. This is absolutely necessary since tempdb is shared by all databases, and SQL Server must therefore implicitly protect temporary storage placed into tempdb by databases encrypted by TDE. But because the encryption in tempdb is implicit, is_encrypted is returned as 0 (false) by sys.databases for tempdb (you’ll see next that SQL Server does actually create DEK for tempdb). This can have an undesirable performance impact for unencrypted databases on the same server instance. For this reason, you may wish to consider isolating separate SQL Server instances; one for encrypted databases and one for non-encrypted databases.

You can also query the dynamic management view sys.dm_database_encryption_keys to see all the DEKs and to monitor the progress of encryption (or decryption, when you disable TDE) running on background threads managed by SQL Server. This view returns the unique database ID that can be joined on sys.databases to see the actual database name. For example, if we run the following query after enabling TDE, we can obtain information about the DEK and background encryption process:

Certain database operations cannot be performed during any of the “in progress” states (2, 4, or 5). These include enabling or disabling encryption, dropping or detaching the database, dropping a file from a file group, taking the database offline, or transitioning the database (or any of its file groups) to a READ ONLY state. Also note the implicit DEK for tempdb created by SQL Server, which always uses AES_256 encryption.

Backing Up the Certificate

It is extremely important to back up the server certificates you use to encrypt your databases with TDE. Without the certificate, you will not be able to access the encrypted database or restore encrypted database backups (which, of course, is the point of TDE). Attempting to restore an encrypted database without the certificate will fail with an error similar to this from SQL Server:

Use the following statement to back up the server certificate to a file. In addition to the certificate itself, the certificate’s private key must also be saved to a file and protected with a password:

This statement creates two files: MyEncryptionCert.certbak is a backup of the server certificate, and MyEncryptionCert.pkbak is a backup of the certificate’s private key protected with the password Pr!vK3yP@ssword. Password protection is absolutely required when backing up the certificate’s private key. Both of these files and the password will be needed to restore an encrypted database backup onto another server or instance. At the risk of stating the obvious, these backup files and the private key password should be closely safeguarded.

Restoring the Certificate

Before an encrypted database can be restored elsewhere, the server certificate that its DEK is encrypted by must be restored first. And if the target instance does not have a master key, one must be created for it before the server certificate can be restored, as shown here:

This statement restores the MyEncryptionCert server certificate from the certificate backup file MyEncryptionCert.certbak and the certificate’s private key backup file MyEncryptionCert.pkbak. Naturally, the password provided in the DECRYPTION BY PASSWORD clause must match the one that was used when the certificate’s private key was backed up or the certificate will fail to restore. With a successfully restored certificate, you can then restore the backup of any encrypted database whose DEK is based on the MyEncryptionCert certificate.

Summary

With the growing concern about personal data protection and the proliferation of computer viruses, developing a methodology for secure computing continues to be a vital task for developers. With support for Transparent Data Encryption in SQL Server 2008, you can easily implement an additional layer of security by encrypting your entire database without making any code changes in your application. Now go give it a try, and enjoy!

This is the final installment in a 3-post series covering the new FILESTREAM feature in SQL Server 2008. In part 1, I explained how FILESTREAM works at a high level, and part 2 showed you the step-by-step procedure for enabling and using this awesome new feature. In this post, I’ll show you how to use the OpenSqlFilestream function exposed by the SQL Server Native Client API to achieve the maximum FILESTREAM performance possible in your .NET applications.

What Is OpenSqlFilestream?

OpenSqlFilestream is a function provided by the SQL Server Native Client API, sqlncli10.dll, and gets installed on your development machine when you install the SQL Server 2008 client tools. This function can be called at the point in time that you want to store and retrieve BLOBs from varbinary(max) FILESTREAM columns, where SQL Server will “step aside,” and allow you to call OpenSqlFilestream to obtain a file handle. With the file handle returned by OpenSqlFilestream, you can stream directly against the file system—a native environment optimized for streaming. Contrast such direct file system access with storing and retrieving BLOBs against varbinary(max) FILESTREAM columns the “old-fashioned” way (either by embedding/extracting byte arrays, or in-lining binary streams as hexadecimal values in T-SQL as I demonstrated in part 2), which carries the additional overhead of the FILESTREAM abstraction layer. Using OpenSqlFilestream instead will give you lightning-fast BLOB performance. Let’s dive in!

Creating the Database

Before getting started, be sure that FILESTREAM is enabled for remote file system access at both the Windows Service and SQL Server instance levels (as explained in part 2). Then create a FILESTREAM-enabled database as follows (be sure to create the directory, C:\DB in this example, before creating the database):

In this table, the Photo column is declared as varbinary(max) FILESTREAM, and will hold pictures that will be stored in the file system behind the scenes. This is virtually the same CREATE TABLE statement as shown in my last post (refer to part 2 for a complete explanation of the varbinary(max) FILESTREAM and ROWGUIDCOL columns). The only thing different here is the default value for the BLOB column. The value 0x represents a zero-length binary stream, which is different than NULL. Think of it as the difference between a zero-length string and a null string in .NET; the two are not the same. Similarly, you won’t be able to use OpenSqlFilestream against NULL instances of varbinary(max) FILESTREAM columns, and you’ll soon see why.

Storing BLOBs Using OpenSqlFilestream

I’ll walk you through the complete steps for building a C# data access class in Visual Studio that demonstrates how to use OpenSqlFilestream. In order to keep things simple for demonstration purposes, we won’t be following best practices; namely, we’ll be executing direct T-SQL statements instead of using stored procedures (as you should be using in production code).

Start Visual Studio and create a new Class Library project named PhotoLibraryDAL with a single class file named PhotoData.cs, and add the following public static method to the class named InsertPhoto:

Client applications call the InsertPhoto method to insert a new photo into the PhotoAlbum table by passing in a photo ID, description, and filename pointing to a local file containing the actual photo image. It then opens a connection, begins a transaction, and executes an INSERT statement. Notice that the INSERT statement supplies values only for the PhotoId and Description columns. What about the RowId and Photo columns? Because we’ve omitted them, the defaults we established in the table definition are applied. The RowId column gets assigned the next available GUID by the GETSEQUENTIALID function. And for the PhotoId BLOB column, the default 0x (zero-length binary stream) applies.

The result of supplying a zero-length binary stream value for the Photo column is that SQL Server creates an empty file in the file system that is linked to the Photo column of the row just inserted. However, because we’ve begun a transaction, SQL Server has automatically initiated an NTFS file system transaction over the empty file added to the file system. So now we have a new row inserted in the table—but not committed, and we have a new empty file created in the file system—but not committed. This only works because we specified (albeit by default) a zero-length binary stream (0x) rather than NULL when inserting the varbinary(max) FILESTREAM value. If an exception occurs, or if the database transaction rolls back, or if any other condition occurs in which the database transaction doesn’t commit successfully, the NTFS file system transaction will get rolled back automatically as well. In that case, both the inserted row in the table and the empty file in the file system go away.

Now is that “point in time” that we’d like SQL Server to “step aside” so we can call OpenSqlFilestream to store the BLOB for the photo image. To do that, we call the SavePhotoFile method, which is coded as follows:

There’s actually nothing magical about this method. You can see that it simply streams, in 512K chunks at a time, from a source stream to a destination stream. This is just the way you’d implement a simple, conventional, file-copy routine. Getting a handle on the source stream is also a no-brainer; since the source file is local, you simply invoke the System.IO.FileStream constructor that accepts a filename and enumerations specifying that the file should be opened for read access. So the question then becomes, how do we get a handle on the destination stream, which is a direct channel to the empty and not-yet-committed file on the server’s NTFS file system that SQL Server associates with the Photo column? The answer lies in the GetOutputFileHandle method which we call to obtain a SafeFileHandle object to the destination file:

This code is at the heart of the matter. In order to call OpenSqlFilestream, we need to obtain two key pieces of information. First, we need a logical path name to the destination file. We obtain that by selecting back the not-yet-committed row we just inserted, and invoking the PathName method on the Photo column. Nobody else can access this row because it hasn’t been committed yet (that would be a dirty read); it will never actually come into existence if the transaction rolls back. But we can read it because we’re inside the transaction, and so we can easily get the path name. No, the path name returned by PathName is not a UNC path to the physical file system on the server, so just lay that security concern to rest. It’s just a bogus path that has meaning only in the context of this transaction which SQL Server can use to map to the real file in the file system. Secondly, we need a token that identifies the NTFS file system transaction that SQL Server initiated behind the scenes, which we obtain with the GET_FILESTREAM_TRANSACTION_CONTEXT function.

Armed with these two key pieces of information, we can call OpenSqlFilestream. Because OpenSqlFilestream is a native code function, I’ve place it in a separate GetSqlFilestreamHandle method in a separate NativeSqlClient class to keep the DllImport and other native code details isolated from our .NET data access class (yes, I’m a neat freak when it comes to code, and you should be too):

As you can see, the GetSqlFilestreamHandle method accepts the transaction context token and the path obtained by the GET_FILESTREAM_TRANSACTION_CONTEXT function and PathName method respectively. It also accepts an enumeration that specifies the desired access mode, which can be Read, Write, or ReadWrite. The OpenSqlFilestream function requires other parameters that are not generally applicable for standard FILESTREAM usage, such as the unsigned 32-bit options and 64-bit allocation size arguments. These simply get passed in as 0. The SafeFileHandle returned by OpenSqlFilestream is defined by the .NET framework in the core library assembly mscorlib.dll, so no special reference needs to be set to access this class. Control then gets passed back up the call stack, to the SavePhotoFile method, which calls an overloaded version of the System.IO.Filestream constructor that accepts a SafeFileHandle object. After the source stream is copied entirely to the destination stream (overwriting the empty file on the server), control returns to the InsertPhoto method and the database transaction is finally committed. At that point, both the inserted row and the file in the file system are permanently saved, and the connection is closed. And that’s the way to stream BLOBs into varbinary(max) FILESTREAM columns in SQL Server 2008!

Retrieving BLOBs Using OpenSqlFilestream

Reading FILESTREAM data back out from the database to your application follows a very similar pattern. You open a connection and start a transaction. Note that this is normally not be considered best practice, as you should always try to have your read operations execute outside the context of a transaction. But to implement OpenSqlFilestream for read operations, this is exactly what you do. Then you use OpenSqlFilestream in the very same was we did for writing the BLOB to the database. The exact implementation depends on where you want to stream the content to.

For example, to stream a photo into an Image object for display in a Windows Forms PictureBox control, you can implement a SelectPhoto method like this:

Because the Image class has a static FromStream method that consumes the stream passed as a parameter to the method, this code essentially firehouses the stream at the fastest possible speed directly from the file system on SQL Server into the Image object.

Have Fun with FILESTREAM!

This concludes my 3-part series on using FILESTREAM in SQL Server 2008, which explains everything you need to know to get the most out of this truly awesome new feature. I hope you enjoyed it, and look forward to hearing your FILESTREAM success stories!