File Operations Made Easy

T-SQL is a great language for managing databases, but it’s lacking in some areas. Shortcomings in its file operations make it difficult to automate tasks such as having an extraction, transformation, and loading (ETL) process check whether a file exists, move a processed file, or check when a file was last modified. Another example is writing to files. When I was working on a project that required creation of XML files, I was surprised to discover that SQL Server's rich XML functionality lacks XML file export. You can easily import an XML file into SQL Server, but you can’t use SQL Server to create one.

Although difficult in T-SQL, these tasks are easily automated using SQL CLR. Tasks that require many lines of T-SQL code can often be handled with a single function call. I’ll show you how to augment T-SQL’s functionality in file operations using CLR.

CLR Primer

CLR was introduced in SQL Server 2005 to allow integration of Microsoft .NET Framework code into SQL Server. Initially, there was a lot of confusion about CLR’s purpose, with some people going as far as to predict the demise of T-SQL. There were also many horror stories circulating about performance, security, and other problems experienced by early adopters. When the dust settled, it turned out that, like many other tools, CLR can be quite useful when used sensibly. T-SQL will always be better at data manipulation, but CLR can be valuable in areas where T-SQL is lacking.

In a nutshell, you can use CLR to create a class by following these steps:

Create a project in Visual Studio.

Write some methods (.NET jargon for procedures) using C# or Visual Basic .NET.

Compile the methods into a DLL.

Create an assembly to deploy the DLL to your SQL Server database.

Create CLR stored procedures and functions for the methods.

After you’ve completed these steps, you can take advantage of the CLR class’s functionality in your T-SQL code.

Step1: Create the Project

To create a project, you need a version of Microsoft Visual Studio 2005 or later that includes C# or Visual Basic .NET—in other words, a full version and not Business Intelligence Development Studio (BIDS). For the examples here, I used Visual Studio 2008 and C#. Note that you need Visual Studio only if you want to create and modify projects. To deploy existing DLLs to SQL Server, you just need SQL Server Management Studio (SSMS).

To begin, create a new Windows class library project in C# in Visual Studio and name it FileCLR. This will create a class module with some prepopulated code, which Listing 1 (below) shows. Rename the Class1.cs file to FileCLR.cs in Solution Explorer. Visual Studio will ask if you also want to change the class reference, which you should allow. You now have an empty FileCLR class library.

Next, change the target framework to .NET Framework 2.0, which is required by SQL Server. To do this, select FileCLR Properties on the Project menu, then select .NET Framework 2.0 in the Target Framework combo box on the Application tab. Finally, change the using directives in callout A in Listing 1 (below) with the using directives in Listing 2 (below). You’re now ready to start writing C# methods.

Step 2: Write the C# Methods

To get an idea of how to write C# methods, I’ll walk you through a few of the methods I’ve written.

DeleteFile. When you want to delete files, you can use the DeleteFile method. As Listing 3 (below) shows, its code is pretty simple. The first line declares the method. It begins with the public static void statement, which is similar in function to T-SQL's CREATE PROCEDURE. The void keyword indicates that nothing is returned, so the method is a procedure and not a function. (Functions would specify the output’s data type instead of specifying void.) The public and static keywords are outside the scope of this discussion. If you’re curious about their purpose, you can check out MSDN’s C# Reference page.

The declaration continues by defining the method’s name and its two parameters (FilePath and ErrorMessage). In C#, you specify the parameter type (e.g., string) before the parameter name. The out keyword in C# is similar to the OUTPUT parameter in T-SQL.

In the second and last lines, notice the curly braces. In C#, curly braces are used to define blocks of statements, similar to the way that T-SQL uses the BEGIN and END keywords.

Let’s now jump to the heart of the code. The .NET Framework has a wealth of classes whose methods you can use to create powerful code. In this case, you call the File class’s Delete method in the .NET Framework's System.IO namespace. So, instead of building a long OS command string and passing it to xp_cmdshell, you can use a single line of code:

System.IO.File.Delete(FilePath);

This call is wrapped inside a try-catch statement (same concept as T-SQL’s TRY…CATCH statement) in case you encounter problems, such as if the file doesn’t exist or is locked. The ErrorMessage output parameter is used to communicate the result to a calling SQL Server procedure. On success, the DeleteFile method returns an empty string (ErrorMesage = ""). On failure, it returns the error message.

After you deploy the DLL to SQL Server and create the CLR stored procedure (which I’ll discuss later), you can delete a file by simply calling the CLR stored procedure using standard T-SQL code:

EXECUTE dbo.clr_DeleteFile

@FilePath,

@ErrorMessage OUTPUT;

WriteStringToFile. One task that’s difficult to script in T-SQL is writing text to a file. With the .NET Framework, it’s fairly easy, as the WriteStringToFile method in Listing 4 (below) shows. The WriteStringToFile method is similar to the DeleteFile method in many ways. For example, like DeleteFile, WriteStringToFile uses the void keyword to indicate that it doesn’t return anything and a try-catch statement to handle errors. The main difference is that WriteStringToFile calls the File class’s WriteAllText method. WriteStringToFile uses WriteAllText to create the specified file and write the specified text to it. If the file already exists, it’s overwritten.

The WriteStringToFile method is especially useful for XML processing. You can use T-SQL's rich XML generation functionality, then use WriteStringToFile to export the result to an XML file.

To call the WriteStringToFile method in T-SQL code, you use a statement such as

EXECUTE dbo.clr_WriteStringToFile

@FilePath,

@FileContents

@ErrorMessage OUTPUT;

Although both WriteStringToFile and DeleteFile use methods in the File class, the System.IO namespace offers many other classes with useful methods. For example, the Path class includes the GetFileName method, which extracts a filename from a path, and the GetDirectoryName method, which extracts the folder name from a Universal Naming Convention (UNC) path. They can save you from having to write logic to parse text strings. Another useful class for file operations is the Directory class. You can use its Move method to move a folder or its GetFiles method to return the names and paths of files that match a particular search pattern. The next example uses some of these methods.

GetFolderList. Getting a list of the files in a folder in T-SQL is notoriously difficult. You need to use xp_cmdshell with the Windows Dir command. The results are verbose, so there’s a lot of extraneous data, which isn’t easily interpreted. To make sense of it, you need to perform some potentially complex and error-prone text manipulation.

In .NET, you can write a much more elegant and reliable method to get directory contents. However, as the GetFolderList method in Listing 5 (below) shows, the code is more complicated than the code in the previous methods. Unlike the previous methods, which didn’t returning anything (as indicated by void), the GetFolderList method uses the IEnumerable interface to return a collection of items. This collection is similar to a table variable in T-SQL.

The GetFolderList method begins by defining a SQL function. This function uses a virtual table to return its results to SQL Server. As callout A shows, the TableDefinition parameter defines that table's sole column as FileName nvarchar(255), which will eventually contain the names of the files in the specified folder.

To fill the rows in the virtual table, the SQL function uses the FillRow method, which is defined in callout B using the FillRowMethodName parameter. FillRow is called internally by the .NET Framework for every item in the collection being returned.

The list of files is built by iterating through the output returned from the Directory class’s GetFiles method, then extracting the filename using the Path class’s GetFileName method. Each filename is passed to the FillRow method as an object that’s converted to .NET string, then to a SQL nvarchar data type. After this rather convoluted process completes, you get a table of filenames. To call the GetFolderList method in T-SQL, you use the SELECT statement:

SELECT [FileName]

FROM dbo.clr_GetFolderList

(@FolderPath, @SearchPattern);

Step 3: Compile the DLL

After you’ve written the C# methods, it’s time to compile them into a DLL. In Visual Studio's Build menu, select Build Solution. This will produce a compiled DLL file in the project's output directory, which you can access from the project's Build Properties tab.

The full code—including the Visual Studio project files and the compiled DLL—for the FileCLR class is available for download. Go to the top of this page and click the 129745.zip link. Besides the DeleteFile, WriteStringToFile, and GetFolderList methods that I’ve shown you, the FileCLR class contains these methods:

AppendStringToFile. Appends the specified string to an existing file, creating the file if it doesn’t exist.

ChangeFileExtensions. Changes the extension of multiple files (e.g., changes all *.txt files to *.csv files) in a single operation.

CreateFolder. Creates a new folder (similar to the Windows Mkdir command).

DeleteFolder. Deletes a folder (similar to the Windows Rmdir command).

DeleteFiles. Deletes all the files matching a search pattern from a folder.

FileExists. Checks for the existence of a file, returning True or False.

GetFileDateCreated. Gets a file’s creation date from the file attributes.

GetFileDateModified. Gets a file’s last modified date from the file attributes.

GetFileSizeByte. Gets a file’s size in bytes.

GetFolderList2. Lists the name, extension, size, creation date, and last modified date of each file in a folder.

MoveFile. Moves a file.

MoveFiles. Moves multiple files (e.g., all *.txt files) in a single operation.

RenameFile. Renames a file (similar to the Windows Ren command).

RenameFolder. Renames a folder.

Step 4: Create the Assembly

To deploy the DLL you just created to your SQL Server database, you need to create an assembly. The CreateFileClrAssembly.sql script in Listing 6 (below) demonstrates how to do this. To prepare for the assembly, the script enables CLR on the SQL Server instance, creates a test database, and enables the TRUSTWORTHY option for that database. By default, the TRUSTWORTHY option is disabled to guard against threats that could arise if a malicious user modifies detached database files, then attaches them to a production server. However, it’s generally safe to enable this option because such threats are relatively rare and users already need high-level privileges to access detached database files and perform file operations. However, you should assess your situation before enabling this option. SQL Server Books Online (BOL) has more information about the TRUSTWORTHY option and its security implications.

After the preparations, the script uses the CREATE ASSEMBLY statement to create the assembly from the DLL file. You need to make sure the DLL file is stored in C:\temp or change the script to reflect its location. The script sets the permission for the assembly to EXTERNAL_ACCESS because the CLR is accessing the file system.

Step 5: Create the CLR Stored Procedures and Functions

After the DLL is deployed, you need to create a CLR stored procedure or CLR function for each C# method. The CLR stored procedures and functions basically point to the C# methods. CLR stored procedures and functions are similar to those in T-SQL in that the functions return variables and can be embedded in SELECT statements, whereas stored procedures return recordsets and have to be executed using an EXEC statement.

Listing 7 (below) contains the code to create the CLR stored procedures for the DeleteFile and WriteStringToFile methods, and the code to create the CLR function for the GetFolderList method. The CreateClrProcs.sql script (which is in the 129745.zip file) contains the code for all 17 methods.

Time to Take Advantage

It’s now time to take advantage of the new functionality you’ve created. Listing 8 (below) provides examples of how you might use the FileCLR class’s DeleteFile, WriteStringToFile, and GetFolderList methods. The ExploreFileClr.sql script (which is in the 129745.zip file) provides sample code for all 17 methods.

As noted previously, to use FileCLR, .NET Framework 2.0 must be installed on the Windows server hosting your SQL Server instance, which should be patched with the latest service pack. If you’re still running SQL Server 2005, SP3 or later is recommended.

Endless Possibilities

I demonstrated how you can use CLR and the .NET Framework to augment T-SQL functionality in file operations. I also used this technology to create the DateCLR class, which consists of five methods that extend T-SQL’s functionality in date operations. (See the web-exclusive sidebar “Date Operations Made Easy,” for more information about this class.) With all the classes and methods that the .NET Framework offers, the possibilities are virtually endless.