Fuzzy Fulltext Search with Mysql

I just was thinking about
Fuzzy Search
and why it isn't possible in mysql. Sure there is
SOUNDEX
in Mysql, but that's not what I want. I want search results from the
Levenshtein distance.
But Mysql didn't has such a function. I guess the best solution would be a search server like
SOLR or
Elasticsearch. But on a shared
hosting you haven't such a cool thing. A possible solution is to return all results of the table and work with the
levenshtein
function of php. I guess in the most cases this is the best solution. But I just want to test an other way, I guess
maybe a very stupid way.

Levenshtein distance

Just a quick explanation.

"The Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the
Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions,
deletions or substitutions) required to change one word into the other."wikipedia.org

For example the Magento and the word Magneto have a distance from 2. With other metrics the distance
maybe would be 1 cause we just have to switch the n and the e. But the Levenshtein distance is 2,
cause we can't switch, we just can replace. You can test it on your own.

levenshtein('Magento', 'Magneto');

The idea behind Fuzzy Search with the Levenshtein distance

The idea is really simple. If we search for example with the query:

SELECT * FROM `catalog_product_flat_1` WHERE `name` LIKE '%Magento%'

We can just find products with Magento in the name. But if Mysql has no possible way to search with the
Levenshtein distance and we didn't want to return the hole table we just have to search with every regex. So we can
find every entry with a Levenshtein distance of 1 for example. What does that mean. Let's take a look at every regex
that we can produce with a distance of 1.

Insertions

Deletions

Substitutions

All three operations together produce now 22 new regex.

SELECT * FROM `catalog_product_flat_1` WHERE
`name` LIKE '%_agento%' OR
`name` LIKE '%M_gento%' OR
`name` LIKE '%Ma_ento%' OR
`name` LIKE '%Mag_nto%' OR
`name` LIKE '%Mage_to%' OR
`name` LIKE '%Magen_o%' OR
`name` LIKE '%Magent_%' OR
`name` LIKE '%_Magento%' OR
`name` LIKE '%M_agento%' OR
`name` LIKE '%Ma_gento%' OR
`name` LIKE '%Mag_ento%' OR
`name` LIKE '%Mage_nto%' OR
`name` LIKE '%Magen_to%' OR
`name` LIKE '%Magent_o%' OR
`name` LIKE '%Magento_%' OR
`name` LIKE '%agento%' OR
`name` LIKE '%Mgento%' OR
`name` LIKE '%Maento%' OR
`name` LIKE '%Magnto%' OR
`name` LIKE '%Mageto%' OR
`name` LIKE '%Mageno%' OR
`name` LIKE '%Magent%'

This query needs more time, but we are searching fuzzy. If we search with a distance of 1 we have to produce
3*n new regex (n is the length of the word). If we want to search with a distance d, we have to
produce (3*n)d. Not really cool. But I guess we can get some of these regex together. Here is a
little function that can produce very naive regexs with levenshtein distance 1.