Configure and Manage Filters for Search

Indexing documents in an varbinary, varbinary(max), image, or xml data type column requires extra processing. This processing must be performed by a filter. The filter extracts the textual information from the document (removing the formatting). The filter then sends the text to the word-breaker component for the language associated with the table column.

A given filter is specific to a given document type (.doc, .pdf, .xls, .xml, and so forth). These filters implement the IFilter interface. For more information about these document types, query the sys.fulltext_document_types catalog view.

Binary documents can be stored in a single varbinary(max) or image column. For each document, SQL Server chooses the correct filter based on the file extension. Because the file extension is not visible when the file is stored in a varbinary(max) or image column, the file extension (.doc, .xls, .pdf, and so forth) must be stored in a separate column in the table, called a type column. This type column can be of any character-based data type and contains the document file extension, such as .doc for a Microsoft Word document. In the Document table in Adventure Works, the Document column is of type varbinary(max), and the type column, FileExtension, is of type nvarchar(8).

Note

A filter might be able to handle objects embedded in the parent object, depending on its implementation. However, SQL Server does not configure filters to follow links to other objects.

SQL Server installs its own XML and HTML filters. In addition, any filters for Microsoft proprietary formats (.doc, .xdoc, .ppt and so on) that are already installed on the operating system are also loaded by SQL Server. To identify the filters that are currently loaded on an instance of SQL Server, use the sp_help_fulltext_system_components stored procedure, as follows: