This article is an introduction to Fuzzy Matching and how it can improve an Autocomplete widget. Fuzzy Matching is used to find the most appropriate strings into a set of strings, like finding "Sinatra" when you mispelled it "Senatra".

We will setup a Sinatra application displaying an Ajax autocomplete widget, which call the backend to have the best matching results, even if the match is not strictly equal.

Fuzzy Matching ?

Fuzzy Matching, aka Approximate String matching on Wikipedia, is used mainly in spell checkers and in biology to measure the variation between DNA.

In this article, we will use the Levenshtein distance algorithm to fetch results when there would be none using standard methods. Some other matching algorithms are also popular: the Damerau–Levenshtein distance (Levenshtein with transposition of letters), the Soundex (a phonetic algorithm for indexing names by sound) and also the Bitap. Many of them can be found in Ruby, or could also be hand coded.

Using the Levenshtein algorithm, we get a distance between two strings. This gives for example :

The find_countries method can serve as an example. It uses exact and partial matching and use the Levenshtein distance to add some more results. A real-world-awesome-production implementation would be different, by narrowing the results (less results, lower distance).

Wrapping up

With a minimal mathematical background and a minimal technical setup (no indexing, no DB specific feature), we have boosted our autocomplete results.

We've seen how to setup a quick-and-simple sinatra app which computes on the Levenshtein distance on the backend. The Frontend was easily done using the JQuery UI Autocomplete widget.

For advanced use cases, we should improve the way we mix the Fuzzy-maching results with the strictly matching results. Switching to a better algorithm, or a set of algorithm (Longest common substring + Dameau-Levenshtein) could also be easily done.