Another Porter Stemmer in F#

2014-04-20 22:30:47

Stemming

Stemming is the process for reducing words to their root form, e.g. both "acceptable" and "acceptance" might be reduced to "accept".

I'm working on a side project where I will have use for a stemmer, so I decided to look around a bit for a straight forward, and well explained, solution. Many resources seemed to point towards the Porter Stemmer. This is an algorithm created by Martin Porter and it works by setting up a few rules which are then, together with matching word endings, used for matching which words should be reduced to which stems.

Please note that as the title suggests, this is hardly the only F# solution. A search quickly reveals at least two others:

Using the algorithm description, and by getting quite a lot of inspiration from Faisal's solution, I have put together my own implementation:

Type

The only type in this implementation, it denotes either vowel or consonant:

1: 2: 3:

typeprivateKind=
| V
| C

Base Functions

Here are a few helper functions which helps with converting a word to a list of vowels/consonants, group them so that e.g. VCCVC becomes VCVC and finally get the measurement of a word. The measurement is calculated by counting the number of VC pairs.

Steps

And finally, the steps that the words flow through and a function that composes them in the right order. The steps contains the specific conditions and word suffixes that must be matched for a change to be made.

Source Code

Conclusion

In the end I'm pretty happy with the result, I've tried to make it as easy as possible to read and I hope I have reached that goal. I feel that F# has let me translate the description of the original algorithm very close to the code equivalent, much thanks to the amazing pattern matching that the language has to offer.