DESCRIPTION

Stemming reduces related words to a common root form -- for instance, "horse", "horses", and "horsing" all become "hors". Most commonly, stemming is deployed as part of a search application, allowing searches for a given term to match documents which contain other forms of that term.

This module is very similar to Lingua::Stem -- however, Lingua::Stem is pure Perl, while Lingua::Stem::Snowball is an XS module which provides a Perl interface to the C version of the Snowball stemmers. (http://snowball.tartarus.org).

Supported Languages

The following stemmers are available (as of Lingua::Stem::Snowball 0.95):

Benchmarks

Here is a comparison of Lingua::Stem::Snowball and Lingua::Stem, using The Works of Edgar Allen Poe, volumes 1-5 (via Project Gutenberg) as source material. It was produced on a 3.2GHz Pentium 4 running FreeBSD 5.3 and Perl 5.8.7. (The benchmarking script is included in this distribution: devel/benchmark_stemmers.plx.)

Be careful with the values you supply to new(). If lang is invalid, Lingua::Stem::Snowball does not throw an exception, but instead sets $@. Also, if you supply an invalid combination of values for lang and encoding, Lingua::Stem::Snowball will not warn you, but the behavior will change: stem() will always return undef, and stem_in_place() will be a no-op.

LOCALE has no effect; it is only there as a placeholder for backwards compatibility (see Changes). IS_STEMMED must be a reference to a scalar; if it is supplied, it will be set to 1 if the output differs from the input in some way, 0 otherwise.

stem_in_place

$stemmer->stem_in_place(\@words);

This is a high-performance, streamlined version of stem() (in fact, stem() calls stem_in_place() internally). It has no return value, instead modifying each item in an existing array of words. The words must already be in lower case.

lang

my $lang = $stemmer->lang;
$stemmer->lang($iso_language_code);

Accessor/mutator for the lang parameter. If there is no stemmer for the supplied ISO code, the language is not changed (but $@ is set).

encoding

my $encoding = $stemmer->encoding;
$stemmer->encoding($encoding);

Accessor/mutator for the encoding parameter.

stemmers

my @iso_codes = stemmers();
my @iso_codes = $stemmer->stemmers();

Returns a list of all valid language codes.

REQUESTS & BUGS

Please report any requests, suggestions or bugs via the RT bug-tracking system at http://rt.cpan.org/ or email to bug-Lingua-Stem-Snowball@rt.cpan.org.

http://rt.cpan.org/NoAuth/Bugs.html?Dist=Lingua-Stem-Snowball is the RT queue for Lingua::Stem::Snowball. Please check to see if your bug has already been reported.