Using SQL Server 2008 FILESTREAM Storage

SQL Server 2008's FILESTREAM storage offers exciting new performance improvements for storing BLOB data as full-blown files managed by the OS. But as great as the FILESTREAM storage feature is, it’s not exactly trivial to use and requires some legwork to set up correctly. Let’s look at what you need to do to set up, configure, and code to take advantage of FILESTREAM storage.

Why Use FILESTREAM Storage?

Conventional wisdom has long maintained that storing and retrieving files and images within a relational database incurs significant performance penalties. That logic seems sensible, but I had always wondered whether it was based on empirical evidence or mere superstition.

In 2006, Microsoft Research published an excellent white paper that tackled this subject in detail. Its authors conclude that databases could learn a thing or two from file systems, which are much better at managing fragmentation than are relational database engines.

Interestingly enough, they also determined that file systems could learn from database engines by virtue of how they handle smaller bursts of data. (See “To BLOB or not to BLOB: Large Object Storage in a Database or File System.” ) The rough translation: File systems generally work best with blocks of data greater than 1MB in size, while databases work best with blocks of data less than 256K in size. To address this situation, SQL Server 2008 offers highly performant storage and retrieval of smaller bursts of data and adds FILESTREAM storage, letting SQL Server work closely with the file system to achieve the best performance from both worlds.

FILESTREAM Storage Hurdles

Setting up FILESTREAM storage correctly, however, can cause a few headaches. For example, security considerations in distributed environments require a bit more planning to make sure that client applications are able to take advantage of streamed files that are proxied through SQL Server. Similarly, because FILESTREAM storage represents a joint interaction between SQL Server and the host OS, setting up FILESTREAM storage on your servers requires admin permissions at the OS level.

Furthermore, if you want to take advantage of Win32 streaming capabilities, you need to use Windows integrated security (as SQL Server logins obviously interact with the underlying file system). These minor limitations, however, help to safeguard your infrastructure and data. A bigger hurdle is that neither SQL Server Management Studio (SSMS) nor SQL Server itself really has any UI, or native way, to let you stream the contents of a file into a table that’s been marked with the FILESTREAM attribute on one of your varbinary(max) columns.

In other words, if you've got a .jpg file that you want to store within SQL Server, there's no native functionality to convert that image's byte stream into something that you could put, for example, into an INSERT statement. Accordingly, you have to use middle-tier and client applications to leverage this new storage functionality. The problem with accessing BLOB data from within client and middle-tier applications is that the transactional choreography and overhead needed to safely and securely stream file contents in and out of the native file system via SQL Server adds a bit more complexity than you'd normally see when working with queries that involve simple data readers.

FILESTREAM Storage Benefits

Those hiccups aside, FILESTREAM storage is worth using in many scenarios. Besides its performance benefits, it also lets you exceed the 2GB limit previously associated with storing BLOBs, so you can store BLOBs as large as the OS, or file system, permits.

FILESTREAM storage also provides transactional safety when storing files and offers increased file manageability—including the obvious benefit of file backup and restore functionality. Other great benefits include the ability to leverage replication, log shipping, and the power of full-text indexing directly against files stored and managed by SQL Server. (However, operations against FILESTREAM file groups aren’t supported by database snapshots or database mirroring; see SQL Server 2008 Books Online"Using FILESTREAM with other SQL Server Features.")

Configuring Your Server for FILESTREAM Storage

Given that the file system directly manages FILESTREAM data, with SQL Server handling file names, paths, and even security (by proxy), the first thing you need to do when using FILESTREAM storage is to grant SQL Server permission to work directly with the host OS’s file system. Because this is an OS-level change, enabling FILESTREAM storage requires administrative rights on the host server.

As you can see from Figure 1, to enable FILESTREAM functionality you open SQL Server Configuration Manager, then access the Properties page for the individual SQL Server instance that you wish to enable. As part of this configuration process, you specify what kinds of access and streaming rights are available to SQL Server and remote clients.

When it comes to enabling FILESTREAM functionality, you have three choices: 0, which disables FILESTREAM support for this instance; 1, which enables FILESTREAM functionality for T-SQL access; and 2, which enables FILESTREAM support for T-SQL and Win32 access.

After you make the change from within SQL Server Configuration Manager, a new share is created on the host system with the name specified. This isn’t a typical share though, and is intended only to allow very low-level streaming interaction between SQL Server and authorized clients. As such, Microsoft recommends that only the service account used by the SQL Server instance have access to this share. Moreover, because this change takes place at the OS, or service, level you can’t enable it from within SQL Server, and you need to restart your SQL Server instance for the change to take effect.

After you restart, you also need to enable SQL Server to utilize this storage option. Run sp_configure with the appropriate filestream access level (0, 1, or 2) to meet your needs. For my sample application I set the value to the most permissive option, 2, which you can see in Listing 1.

Configuring Databases for FILESTREAM Storage

After configuring FILESTREAM storage at the OS and SQL Server levels, you can start creating tables to house your documents. The key to taking advantage of FILESTREAM storage is to decorate a chosen varbinary(max) field with the FILESTREAM attribute as I’ve done in Listing 2.

As long as you have created a specialized FILESTREAM File Group within your database, the data stored in this column won’t persist to pages or extents managed by the SQL Server. Instead it will stream to the file system where you can take advantage of the increased performance when it comes to larger file sizes.

Note, however, in Listing 2 that the FILEGROUP that I’ve specified for use with FILESTREAM storage (with the CONTAINS FILESTREAM attribute) doesn’t actually point to a file such as files.ndf. Instead, it points to a folder, which is where files and file data will actually end up being stored.

FILESTREAM-enabled tables also require the existence of a UNIQUE ROWGUIDCOL to support storage interaction, which I've handled by using the FileId column in my sample table. Likewise, in the code displayed in Listing 2 I’m also specifying a default of (0x) on my FileData column.This default causes SQL Server to create a new, blank file within the FILESTREAM storage folder for Win32 streaming clients to access and use as a path when attempting to upload file contents. Without this default creation of a "blank" file, new files would have to be created by streaming serialized contents directly through T-SQL before they could be accessed via Win32 client applications.

Listing 3 shows typical FILESTREAM use where only T-SQL is involved. The INSERT statement creates a new row in the table (where the bogus file contents are actually persisted to disk as a file). Then, the SELECT statement uses the new GET_FILESTREAM_TRANSACTION_CONTEXT() built-in function and the new .PathName() function to get a transaction context, or ID. This transaction context, or ID, is then handed off to a Win32 client application to access the contents of the file in question at the FileData.PathName() specified.

Client Applications and FILESTREAM Storage

For a real-world example of using FILESTREAM data, I’ve created a very simple FILESTREAM Browser WinForms app, which Figure 2 shows, to display the contents of the FileSamples table created earlier. This simple application also lets users upload files from their desktop (or elsewhere) to SQL Server FILESTREAM storage or download a file placed in FILESTREAM storage to whatever path they specify.

In the code accompanying this article, you’ll see that UI logic has been relegated to Form1.cs while FILESTREAM interactions have all been placed within a helper class found in the FileStreamSample.cs file. This class, in turn, contains three methods: One method pulls back a list of the current files in the FileSamples table, and two methods allow downloads and uploads as needed. Furthermore, since files are identified by the UNIQUEIDENTIFIER FileId column, logic in all three methods is closely centered on these GUIDs as handles to help direct operations.

In Listing 4, code from my upload method creates a new file by firing off an INSERT statement that populates metadata for the file that will be uploaded. The code then creates a new transaction, enlists a command object within the transaction (to gain a transaction context), and executes a SqlDataReader connection to pull back the full path (in SQL Server) to the file that will be uploaded, along with a transaction context ID that can be used to initiate the Win32 file-streaming process.

By using a context ID and the actual FILESTREAM storage path for the new file, the code can then initiate a straight file-streaming operation using the System.Data.SqlTypes.SqlFileStream class, which debuted in .NET 3.5 SP1. Then, as you can see in Listing 5, I simply create a new System.IO.FileStream object to read my file locally and buffer bytes along in 512k chunks (an arbitrary selection) to my SqlFileStream object until there are no more bytes to transfer.

At this point, I can close the transaction and my file has been successfully uploaded to SQL Server as file data instead of normal varbinary(max) data. (Note, however, that since we're actively streaming file contents to a WIN32 process, we have to use integrated security, because SQL logins can't generate the needed tokens to access the underlying file system.)

Download operations follow the same paradigm of creating a new transaction, then grabbing a transaction context ID and the path for the file you want to stream to your client. However, when it comes to the streaming operation with downloads, you do everything in reverse—pulling data from a SqlFileStream object into a buffer and pushing it into a local FileStream object until there are no more bytes left to transfer.

Getting the Best of Both Worlds

By leveraging the approach outlined above, you can gain greater performance for operations involving large files and still take advantage of SQL Server's ability to provide high performance processing of small files. In this way, you get the best of both worlds: high performance and relational database integrity.

From the Blogs

The quest for the Golden Record to achieve a single, accurate and complete version of a customer record is worth the pursuit to attain survivorship. Record matching and consolidation are only the beginning. Melissa Data takes a new approach. Learn how to apply intelligent rules based on reference data to make smarter and better decisions for data cleansing....More

On SQL Servers where Availability Groups (or Mirroring) isn’t in play, I typically recommend keeping a combination of on-box backups along with copying said backups off-box as well. Obviously, keeping databases AND backups on the SAME server is the metaphorical equivalent of putting all of your eggs in one basket – and therefore something you should avoid like the plague....More

One of the biggest strengths of AlwaysOn Availability Groups is that they allow DBAs to address both high availability and disaster recovery concerns from a single set of tooling or interfaces. But, this doesn’t mean that you won’t still need backups....More