12.9.6. Fine-Tuning MySQL Full-Text Search

MySQL's full-text search capability has few user-tunable parameters. You can exert more control over full-text
searching behavior if you have a MySQL source distribution because some changes require source code
modifications. See Section 2.9, "Installing MySQL from Source".

Note that full-text search is carefully tuned for effectiveness. Modifying the default behavior in most cases
can actually decrease effectiveness. Do not alter the MySQL sources unless you know
what you are doing.

Most full-text variables described in this section must be set at server startup time. A server restart is
required to change them; they cannot be modified while the server is running.

Some variable changes require that you rebuild the FULLTEXT indexes in your tables.
Instructions for doing so are given later in this section.

The minimum and maximum lengths of words to be indexed are defined by the innodb_ft_min_token_size
and innodb_ft_max_token_size
for InnoDB search indexes, and ft_min_word_len and ft_max_word_len for MyISAM ones. After
changing any of these options, rebuild your FULLTEXT indexes for the change
to take effect. For example, to make two-character words searchable, you could put the following lines
in an option file:

[mysqld]innodb_ft_min_token_size=2ft_min_word_len=2

Then restart the server and rebuild your FULLTEXT indexes. For MyISAM tables, note particularly the remarks regarding myisamchk in the instructions following this list.

To override
the default stopword list, set the ft_stopword_file system variable. (See Section
5.1.4, "Server System Variables".) The variable value should be the path name of the file
containing the stopword list, or the empty string to disable stopword filtering. The server looks for
the file in the data directory unless an absolute path name is given to specify a different directory.
After changing the value of this variable or the contents of the stopword file, restart the server and
rebuild your FULLTEXT indexes.

The stopword list is free-form, separating stopwords with any nonalphanumeric character such as
newline, space, or comma. Exceptions are the underscore character ("_") and a single apostrophe ("'") which are
treated as part of a word. The character set of the stopword list is the server's default character
set; see Section 10.1.3.1, "Server Character
Set and Collation".

The 50% threshold for natural language searches is determined by the particular
weighting scheme chosen. To disable it, look for the following line in storage/myisam/ftdefs.h:

#define GWS_IN_USE GWS_PROB

Change that line to this:

#define GWS_IN_USE GWS_FREQ

Then recompile MySQL. There is no need to rebuild the indexes in this case.

Note

By making this change, you severely decrease
MySQL's ability to provide adequate relevance values for the MATCH() function. If you really need to search for such
common words, it would be better to search using IN BOOLEAN MODE
instead, which does not observe the 50% threshold.

To change the operators used for boolean full-text searches on MyISAM tables, set the ft_boolean_syntax system variable. (InnoDB
does not have an equivalent setting.) This variable can be changed while the server is running, but you
must have the SUPER
privilege to do so. No rebuilding of indexes is necessary in this case. See Section
5.1.4, "Server System Variables", which describes the rules governing how to set this variable.

You can change the set of characters that are considered word characters in several
ways, as described in the following list. After making the modification, rebuild the indexes for each
table that contains any FULLTEXT indexes. Suppose that you want to treat
the hyphen character ('-') as a word character. Use one of these methods:

Modify the MySQL source: In storage/myisam/ftdefs.h,
see the true_word_char() and misc_word_char()
macros. Add '-' to one of those macros and recompile MySQL.

Modify a character set file: This requires no recompilation. The true_word_char() macro uses a "character type" table to distinguish letters and numbers
from other characters. . You can edit the contents of the <ctype><map>
array in one of the character set XML files to specify that '-'
is a "letter." Then use the given
character set for your FULLTEXT indexes. For information about
the <ctype><map> array format, see Section
10.3.1, "Character Definition Arrays".

Alternatively, use ALTER TABLE with the DROP INDEX and ADD INDEX options to drop and re-create each FULLTEXT
index. In some cases, this may be faster than a repair operation.

Each table that contains any FULLTEXT index must be repaired as just shown.
Otherwise, queries for the table may yield incorrect results, and modifications to the table will cause the
server to see the table as corrupt and in need of repair.

Note that if you use myisamchk to perform an operation that modifies table indexes
(such as repair or analyze), the FULLTEXT indexes are rebuilt using the default full-text parameter values for minimum word length, maximum
word length, and stopword file unless you specify otherwise. This can result in queries failing.

The problem occurs because these parameters are known only by the server. They are not stored in MyISAM index files. To avoid the problem if you have modified the minimum or maximum
word length or stopword file values used by the server, specify the same ft_min_word_len, ft_max_word_len, and ft_stopword_file values for myisamchk that you use for mysqld. For example, if you have set the minimum word length
to 3, you can repair a table with myisamchk like this:

shell> myisamchk --recover --ft_min_word_len=3 tbl_name.MYI

To ensure that myisamchk
and the server use the same values for full-text parameters, place each one in both the [mysqld]
and [myisamchk] sections of an option file: