Stemming

The following steps need to be taken to implement Stemming in Nutch. Howie Wang is the person credited with doing so for version 0.7.2. I updated the process for Version 0.8. That can be found below. - Matthew Holt

*** YOU MUST DISABLE THE QUERY-BASIC PLUGIN IN ORDER FOR THIS TO WORK (this replaces all query-basic functionality)***

"I've gotten a couple of questions offlist about stemming so I thought I'd just post here with my changes. Sorry that some of the changes are in the main code and not in a plugin. It seemed that it's more efficient to put in the main analyzer. It would be nice if later releases could add support for plugging in a custom stemmer/analyzer."

(Note by AlessandroGasparini) on the 0.8.1 you could easily enable the Stemming using the multi-language support facilities and without touching the code. (Perhaps you have to write a plugin for your specific language but it's a lot more simple) see by yourself: MultiLingualSupport

And the rest is a new QueryFilter plugin that I'm calling query-stemmer. Here's the full source for the Java file. You can copy the build.xml and plugin.xml from query-basic, and alter the names for query-stemmer.

Version 0.8

(Note by AlessandroGasparini) on the 0.8.1 you could easily enable the Stemming using the multi-language support facilities and without touching the code. (Perhaps you have to write a plugin for your specific language but it's a lot more simple) see by yourself: MultiLingualSupport

*** YOU MUST DISABLE THE QUERY-BASIC PLUGIN IN ORDER FOR THIS TO WORK (this replaces all query-basic functionality)*** The first change I made is in NutchDocumentAnalyzer.java.

And the rest is a new QueryFilter plugin that I'm calling query-stemmer. Here's the full source for the Java file. You can copy the build.xml and plugin.xml from query-basic, and alter the names for query-stemmer.