July 18, 2008

Since putting a computer on the desktop of almost everyone in the workforce, there has been a proliferation in the amount of unstructured data that is being generated. From a DBA's perspective unstructured data is any data that does not have a data structure, in other words the data is not stored in a database. A typical example of unstructured data is image and movie files or e-mails and spreadsheets. A recent study by Merrill Lynch found that more than 85 percent of all business information is stored in unstructured data.
The management of unstructured data is one of the biggest problems facing the IT industry. As despite being able to generate the data, it is extremely difficult to provide a mechanism to access this data and derive business value from it. According to Gartner, knowledge workers spend up to 30% of a working week managing unstructured data. Several approaches are currently available to try and assist with this challenge; however, each approach has its own drawbacks. For example, storing unstructured data in a file server or BLOB store adds complexity when developing applications. The complexity is introduced as the application needs to manage the link between the records stored in the database and the system that stores the BLOBs.
Over the last few days I have been exploring the FILESTREAM attribute in SQL Server 2008 and this feature looks like it has finally solved the problem of unstructured data. The FILESTREAM attribute can be applied to a varbinary column allowing SQL Server to store unstructured data for a column on the local NTFS file system. A great article for understating the features in SQL Server 2008 for unstructured data can be found here; however, I would love to hear about how you are currently dealing with this challenge.
- Peter Ward