DESCRIPTION

This module contains a collection of stemmers for multiple languages based on stemming algorithms provided by Jacques Savoy of the University of Neuchâtel (UniNE). The languages currently implemented are Bulgarian, Czech, German, and Persian. Work is ongoing for Arabic, Bengali, Finnish, French, Hindi, Hungarian, Italian, Portuguese, Marathi, Russian, Spanish, and Swedish. The top priority is languages for which there are no stemmers available on CPAN.

Country codes such as cz for the Czech Republic are not supported, nor are IETF language tags such as fa-AF or fa-IR.

aggressive

By default, if there are multiple strengths of stemmers, a light stemmer will be used. When aggressive is set to true, an aggressive stemmer will be used if available.

$stemmer->aggressive(1);

Czech and German have aggressive options.

Methods

stem

Accepts a list of words, stems each word, and returns a list of stems. The list returned will always have the same number of elements in the same order as the list provided. When no stemming rules apply to a word, the original word is returned.

@stems = $stemmer->stem(@words);
# get the stem for a single word
$stem = $stemmer->stem($word);

The words should be provided as character strings and the stems are returned as character strings. Byte strings in arbitrary character encodings are intentionally not supported.