The MATCH predicate performs a fulltext search on one or more indexed columns
or indices and supports different matching techniques. It can also be used to
perform geographical searches on
geo_shape indices.

The actual applicability of the MATCH predicate depends on the index’s type. In
fact, the availability of certain match_types and match_parameters
depend on the index. This section however, only covers the usage of the MATCH
predicate on fulltext indices on text columns. To use MATCH on
geo_shape indices, see Geo search.

In order to use fulltext searches on a column, a fulltext index with an
analyzer must be created for this column. See
Fulltext indices for details. There are different types of
Fulltext indices with different goals, however it’s not possible to
query multiple index columns with different index types within the same MATCH
predicate.

To get the relevance of a matching row, a specific system column
_score can be selected. It
contains a numeric score relative to the other rows: The higher, the more
relevant the row:

The MATCH predicate in its simplest form performs a fulltext search against a
single column. It takes the query_term and, if no analyzer was
provided, analyzes the term with the analyzer configured on
column_or_idx_ident. The resulting tokens are then matched against the
index at column_or_idx_ident and if one of them matches, MATCH returns
TRUE.

The MATCH predicate can be also used to perform a fulltext search on multiple
columns with a single query_term and to add weight to specific columns it’s
possible to add a boost argument to each column_or_idx_ident. Matches
on columns with a higher boost result in a higher _score value for that document.

The match_type argument determines how a single query_term is applied
and and how the resulting _score is computed. For more information see
Match Types.

Results are ordered by _score
by default, but can be overridden by adding an ORDERBY clause.

If the column has an implicit index (e.g. created with something like
TEXTcolumn_aINDEXUSINGFULLTEXT) this should be the name of the
column.

If the column has an explicit index (e.g. created with something like INDEX"column_a_idx"USINGFULLTEXT("column_a")WITH(...)) this should be the
name of the index.

By default every column is indexed but only the raw data is stored, so
matching against a text column without a fulltext index is
equivalent to using the = operator. To perform real fulltext
searches use a Fulltext index with analyzer.

boost:

A column ident can have a boost attached. That is a weight factor that
increases the relevance of a column in respect to the other columns.
The default boost is 1.

query_term:

This string is analyzed (using the explicitly given analyzer or
the analyzer of the columns to perform the search on) and the
resulting tokens are compared to the index. The tokens used for search
are combined using the boolean OR operator unless stated otherwise
using the operator option.

match_type:

Optional. Defaults to best_fields for fulltext indices. For
details see Match Types.

Note

The MATCH predicate can only be used in the WHERE clause
and on user-created tables. Using the MATCH predicate on system tables is
not supported.

One MATCH predicate cannot combine columns of both relations of a join.

Additionally, MATCH predicates cannot be used on columns of both
relations of a join if they cannot be logically applied to each of them
separately. For example:

The match type determines how the query_term is applied and the
_score is created, thereby
influencing which documents are considered more relevant. The default
match_type for fulltext indices is best_fields.

best_fields:

Use the _score of the
column that matched best. For example if a column contains all the
tokens of the query_term it’s considered more relevant than other
columns containing only one.

This type is the default, if omitted.

most_fields:

This match type takes the _score of every matching column and
averages their scores.

cross_fields:

This match type analyzes the query_term into tokens and searches
all tokens in all given columns at once as if they were one big column
(given they have the same analyzer). All tokens have to be present in
at least one column, so querying for foobar should have the
tokens foo in one column and bar in the same or any other.

phrase:

This match type differs from best_fields in that it constructs a
phrase query from the query_term. A phrase query will only match
if the tokens in the columns are exactly in the same order as the
analyzed columns from the query_term. So, querying for foobar
(analyzed tokens: foo and bar) will only match if one of the
columns contains those two token in that order - without any other
tokens in between.

phrase_prefix:

This match type is roughly the same than phrase but it allows to
match by prefix on the last token of the query_term. For example
if your query for fooba, one of the columns has to contain
foo and a token that starts with ba in that order. So a column
containing foobaz would match and foobar too.

The match options further distinguish the way the matching process using a
certain match type works. Not all options are applicable to all match types.
See the options below for details.

analyzer:

The analyzer used to convert the query_term into tokens.

boost:

This numeric value is multiplied with the resulting _score of this match call.

If this match call is used with other conditions in the where
clause a value above 1.0 will increase its influence on the overall
_score of the whole query, a
value below 1.0 will decrease it.

cutoff_frequency:

The token frequency is the number of occurrences of a token in a
column.

This option specifies a minimum token frequency that excludes matching tokens
with a higher frequency from the overall _score. Their _score is only included if another token
with a lower frequency also matches. This can be used to suppress
results where only high frequency terms like the would cause a
match.

fuzziness:

Can be used to perform fuzzy full text search.

On numeric columns use a numeric, on timestamp columns a long
indicating milliseconds, on strings use a number indicating the
maximum allowed Levenshtein Edit Distance. Use prefix_length,
fuzzy_rewrite and max_expansions to fine tune the fuzzy
matching process.

fuzzy_rewrite:

The same than rewrite but only applies to queries using
fuzziness.

max_expansions:

When using fuzziness or phrase_prefix this options controls to
how many different possible tokens a search token will be expanded.
The fuzziness controls how big the distance or difference between
the original token and the set of tokens it is expanded to can be.
This option controls how big this set can get.

minimum_should_match:

The number of tokens from the query_term to match when or is
used. Defaults to 1.

operator:

Can be or or and. The default is or. It is used to combine
the tokens of the query_term. If and is used, every token from
the query_term has to match. If or is used only the number of
minimum_should_match have to match.

prefix_length:

When used with fuzziness option or with phrase_prefix this
options controls how long the common prefix of the tokens that are
considered as similar (same prefix or fuzziness
distance/difference)has to be.

rewrite:

When using phrase_prefix the prefix query is constructed using all
possible terms and rewriting them into another kind of query to
compute the score. Possible values are constant_score_auto,
constant_score_boolean, constant_score_filter,
scoring_boolean,``top_terms_N``, top_terms_boost_N. The
constant_... values can be used together with the boost option to set
a constant _score for rows
with a matching prefix or fuzzy match.

slop:

When matching for phrases this option controls how exact the phrase
match should be (proximity search). If set to 0 (the default), the
terms must be in the exact order. If two transposed terms should
match, a minimum slop of 2 has to be set. Only applicable to
phrase and phrase_prefix queries. As an example with slop
2, querying for foobar will not only match foobar but also
foowhatabar.

tie_breaker:

When using best_fields, phrase or phrase_prefix the _score of every other column will be
multiplied with this value and added to the _score of the best matching column.

Defaults to 0.0.

Not applicable to match type most_fields as this type is executed
as if it had a tie_breaker of 1.0.

zero_terms_query:

If no tokens are generated analyzing the query_term then no
documents are matched. If all is given here, all documents are
matched.

cr>selectname,description,_scorefromlocations...wherematch(...(name_description_ft),'end of the galaxy'...)usingphrasewith(analyzer='english',slop=4);+------+-------------------------+-----------+| name | description | _score |+------+-------------------------+-----------+| NULL | The end of the Galaxy.% | 1.5614429 |+------+-------------------------+-----------+SELECT 1 row in set (... sec)

A vast amount of options exist to fine-tune your fulltext search. A detailed
reference can be found here MATCH Predicate.

It is possible to filter results by the _score column but as its value is a computed
value relative to the highest score of all results and consequently never
absolute or comparable across searches the usefulness outside of sorting is
very limited.

Although possible, filtering by the greater-than-or-equals(‘>=’) operator on
the _score column would not
make much sense and can lead to unpredictable result sets.

As you might have noticed, the _score value has changed for the same query
text and document because it’s a ratio relative to all results, and by
filtering on _score, ‘all
results’ has already changed.

Caution

As noted above _score is a
relative number and not comparable across searches. Filtering is therefore
greatly discouraged.