Featured Database Articles

Full Text Catalog Configuration in SQL Server 2005 Express Edition

In the previous installment of our series dedicated to the most prominent
features available in SQL Server 2005 Express Edition, we have concluded the
coverage of its implementation of Reporting Services. There are a few remaining
topics that we have not discussed yet, whose importance warrants our attention.
One of them is Full Text Indexing and Search, whose characteristics we will be
exploring starting with this article.

You are likely quite familiar with standard T-SQL capabilities allowing you
to query character-based data according to full or partial matches (typically
relying on wildcards, such as %,
_, [], or [^]). There are, however, situations where these options
turn out to be too limited (or simply too inefficient from a performance
standpoint, resulting in full table scans), especially when dealing with
unstructured text containing inflections, diminutives, and misspellings, or
based on unsupported by LIKE
statements data types. Such needs are addressed by the Full Text Search
functionality, allowing you to delve deeper into a variety of lexical
arrangements, including locating phrases that are similar to each other. The
level of similarity can be adjusted by assigning a desired weigh factor, words
sharing a common base, or a combination of several words appearing near each
other (so called proximity searches). In addition, it becomes possible to
perform localization-specific queries, which take into consideration rules
pertinent to an arbitrarily chosen language.

In its original implementation, full text indexing relied on the Microsoft Search service built into the
host operating system. Starting with the SQL Server 2005 platform, its
responsibilities have been relegated to the Full-Text Engine, implemented as
the MSFTESQL.EXE process and
operating as SQL Server FullText Search (SQLEXPRESS)
service (where SQLEXPRESS
is the name of the SQL Server 2005 Express Edition instance where this feature
has been installed). This new component interacts with the SQL Server database
engine to facilitate creating, populating, and managing full-text indexes and
their catalogs, as well as to support execution of queries based on them. It is
aided by the Full-Text Engine Filter Daemon (running as MSFTEFD process) that enhances indexing
and searching functionality through filters (making possible to analyze data in
formats typically not associated with SQL Server, such as Word, Excel, or
PowerPoint, stored in columns of image and varbinary(max)
types), word breakers (delimiting word boundaries) and stemmers (assisting with
organizing conjugated verbs). The resulting indexes are stored within catalogs
residing in the local file system (by default, in the Program Files\Microsoft SQL Server\MSSQL.x\MSSQL\FTData
folder, where x
designates an integer assigned to a local SQL Server 2005 instance), rather
than SQL Server-hosted databases (keep this fact in mind when devising a backup
strategy). Full-Text Search, however, does keep some of its metadata in
internally managed tables (fulltext_index_map
and fulltext_catalog_freelist).

Due to its potential performance implications, full text indexing is not
enabled by default in the SQL Server 2005 Express Edition. To make its
functionality available, you need first to explicitly add it during the setup
process. With the base component in place, you will have an option to enable
full-text indexing on a target database (note that this happens to be the
default applied when creating new user databases via CREATE DATABASE statement, but not when
performing the same task via graphical interface within SQL Server Management
Studio Express) and create a full-text catalog associated with it (remember
that it is not possible to create full-text catalogs in master, model, or tempdb
system databases). The final steps consist of creating and populating full-text
indexes. It is important to realize that they are applicable only to columns of
specific data types (namely character-based including char, nvarchar,
and varchar, as well as
formatted as binary type, such as image,
and varbinary(max)), and
that tables containing these columns must have a unique, single-column, non-nullable,
preferably clustered index (which is referenced during index creation). It is
also helpful to know that there are several factors affecting the process of
populating and querying indexes. One of them is the presence of so-called noise
words (such as pronouns, conjunctions, or prepositions), which are excluded
from indexing or search results. Another involves the use of thesaurus,
allowing you to incorporate synonyms into search criteria (language-specific
noise and thesaurus files reside in the Program
Files\Microsoft SQL Server\MSSQL.x\MSSQL\FTData folder and
can be customized to match your requirements). You should also be aware that
the index population mechanism depends on the value of its CHANGE_TRACKING option (associated with
the index at its creation), which, in addition, determines whether SQL Server
maintains a list of changes to indexed data (we will discuss these topics in
more details in our next article). Finally, keep in mind that there is a limit
of a single full-text index per table.

Let's walk through a sample implementation of a full-index catalog and
demonstrate a few searches that take advantage of its benefits. We will be
working with the AdventureWorks
database (refer to one
of our earlier articles for specific instructions regarding adding it to
your SQL Server 2005 Express Edition instance). Start by verifying whether the
Full-Text Index is present on your computer. One way to confirm whether this is
the case is to review the list of active services and look for SQL Server FullText Search (SQLEXPRESS)
entry, where SQLEXPRESS
matches the name of the local database engine instance. Another possibility is
to examine the outcome of SELECT fulltextserviceproperty('IsFulltexIinstalled')
T-SQL statement, which should yield 1.

If it turns out that the Full-Text Search feature is missing, you will have
to extract the installation files included in the Microsoft SQL Server 2005
Express Edition with Advanced Services, available from the Microsoft Download
Center as a single, compressed executable SQLEXPR_ADV.EXE. This can be accomplished by invoking the
executable with the -x
switch, after which you will be prompted to Choose
Directory For Extracted Files. You will need to point to the setup.exe file at that location when
selecting the Change option
from the context sensitive menu of SQL
Server 2005 entry of the listing presented in Add or Remove Programs or Programs and Features (depending on the
operating system version) Control Panel applet. On the other hand, if you are
installing a new instance of SQL Server 2005 Express Edition, simply make sure
to add the Full-Text Search
feature (via the Feature Selection
dialog box in the Microsoft SQL Server
2005 Setup wizard) to the list of other components you intend to
use.

Once you have confirmed that the SQL
Server FullText Search service is present and operational,
disable user instances (since they are not compatible with full-text indexing)
by running sp_configure 'user instances
enabled', '0' via Query
Editor. Next, verify that the indexing is enabled on a target
database. This can be accomplished either by checking the state of the Use full-text indexing checkbox in the Files section of the Database Properties dialog box in the SQL Server Management Studio Express
interface or by running SELECT
DATABASEPROPERTY('database_name','IsFullTextEnabled'),
where the 'database_name'
parameter is set in our case to 'AdventureWorks').
Modifying this option involves either selecting the checkbox or invoking sp_fulltext_database stored procedure
with 'enable' switch while
connected to the target database. (Similarly, when creating new databases, you
could simply check the Use full-text
indexing checkbox in the New
Database dialog box). Assuming that the outcome is successful,
you are ready to create a full-text catalog by running the CREATE FULLTEXT CATALOG T-SQL statement,
which, by default, will result in the creation of a directory structure under the
Program Files\Microsoft SQL Server\MSSQL.x\MSSQL\FTData
folder (carrying out this command requires at the minimum CREATE FULLTEXT CATALOG permissions). By
employing the optional parameters of this statement, you also have the ability
to alter the location of catalog files (by pointing to another local disk using
the IN PATH attribute, which
is recommended if you expect their size to be significant), designate a catalog
as the database default (by including AS
DEFAULT clause) or accommodate accent-sensitive searches (by
appending WITH ACCESS_SENSITIVITY=ON).
In our example (with ftcAdventureWorks
catalog residing in D:\AppData\SQL\FTData
folder), this procedure consists of the following steps:

Note that, unlike full-featured editions of SQL Server 2005, the graphical
interface of SQL Server Management Studio Express does not expose options for
creating new or viewing existing catalogs. However, you can simplify your
management tasks by taking advantage of templates available in SQL Server
Management Studio, by connecting it to your SQL Server 2005 Express Edition
instance and following the steps described in the Microsoft Knowledge Base
article 916784. In either
case, the end result is the same, yielding a full text catalog associated with
the AdventureWorks database.
In the upcoming articles of our series, we will describe the remaining steps
necessary to create a full-text index on one of tables in the AdventureWorks database as well as
present sample queries that demonstrate the power and flexibility of full-text
searches.