Now that we have names in memory, how can I classify information and search in it ?

Classify, classify, classify

After analysing the data, I was finding my name (Clément) when I see that there was different spelling (accent and other difference).
To reduce the amount of duplicate phonetic match, I would like to group name phonetically.

The main problem with phonetic algorithm was the lowest tolerance with cross language spelling variation.

Here is some cases where I did not find the right math :

“Yolaine” : “Yolène”, “Yolene”

“Clément” : “Clement”, “Klement”

I try with different french algorithms like Soundex and Phonex and the best one was Phonex (Good balance between redundancy and spelling variation).

I patched the Phonex algorithm to match the 2 cases “Yolaine” and “Clement”.

This is not the best implementation (due to string allocations) but it is ok for our case.
I tried to use FParsec for this case but the performance was not really impressive (may be I was wrong.).
Here is my attempt if you want to give me a better implementation : https://github.com/cboudereau/firstname/blob/master/phonex.fsx

The main point : the function composition of the Phonex hash function

Pipeline pattern

Easy to fix

Easy to test (unit tests or integration tests)

Sounds good :)

A quick test

Now that we have the algorithm to hash phonetically names, we can write a function that reduces the names count!