Beider-Morse Phonetic Matching (BMPM)

For examples of how to use this encoding in your analyzer, see Beider Morse Filter in the Filter Descriptions section.

Beider-Morse Phonetic Matching (BMPM) is a "soundalike" tool that lets you search using a new phonetic matching system. BMPM helps you search for personal names (or just surnames) in a Solr/Lucene index, and is far superior to the existing phonetic codecs, such as regular soundex, metaphone, caverphone, etc.

In general, phonetic matching lets you search a name list for names that are phonetically equivalent to the desired name. BMPM is similar to a soundex search in that an exact spelling is not required. Unlike soundex, it does not generate a large quantity of false hits.

From the spelling of the name, BMPM attempts to determine the language. It then applies phonetic rules for that particular language to transliterate the name into a phonetic alphabet. If it is not possible to determine the language with a fair degree of certainty, it uses generic phonetic instead. Finally, it applies language-independent rules regarding such things as voiced and unvoiced consonants and vowels to further insure the reliability of the matches.

For example, assume that the matches found when searching for Stephen in a database are "Stefan", "Steph", "Stephen", "Steve", "Steven", "Stove", and "Stuffin". "Stefan", "Stephen", and "Steven" are probably relevant, and are names that you want to see. "Stuffin", however, is probably not relevant. Also rejected were "Steph", "Steve", and "Stove". Of those, "Stove" is probably not one that we would have wanted. But "Steph" and "Steve" are possibly ones that you might be interested in.

For Solr, BMPM searching is available for the following languages:

English

French

German

Greek

Hebrew written in Hebrew letters

Hungarian

Italian

Polish

Romanian

Russian written in Cyrillic letters

Russian transliterated into English letters

Spanish

Turkish

The name matching is also applicable to non-Jewish surnames from the countries in which those languages are spoken.

Daitch-Mokotoff Soundex

The Daitch-Mokotoff Soundex algorithm is a refinement of the Russel and American Soundex algorithms, yielding greater accuracy in matching especially Slavic and Yiddish surnames with similar pronunciation but differences in spelling.

The main differences compared to the other soundex variants are:

coded names are 6 digits long

initial character of the name is coded

rules to encoded multi-character n-grams

multiple possible encodings for the same name (branching)

Note: the implementation used by Solr (commons-codec’s DaitchMokotoffSoundex ) has additional branching rules compared to the original description of the algorithm.

Double Metaphone

To use this encoding in your analyzer, see Double Metaphone Filter in the Filter Descriptions section. Alternatively, you may specify encoder="DoubleMetaphone" with the Phonetic Filter, but note that the Phonetic Filter version will not provide the second ("alternate") encoding that is generated by the Double Metaphone Filter for some tokens.