Jan Høydahl
added a comment - 24/Feb/11 16:52 We have two choices, suggested by Yonik:
update the wiki with the missing analysis components or
think about switching strategies to pointing at generated javadoc

Jan Høydahl
added a comment - 25/Feb/11 08:56 Agree. That has the benefit of improving JavaDoc quality as well for a lot of classes.
An example of excellent JavaDoc is the Similarity class: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Similarity.html
Browsing through the analyzers, they are very sparse on javadoc
ASIDE: And what about finally making the move to the Confluence Wiki as well ( https://cwiki.apache.org/SOLRxSITE/ ). Then we could simply include Javadoc inline in pages through the javadoc plugin https://plugins.atlassian.com/plugin/details/11120 , and also get auto linking to Jira issues.

Few of the TokenFilterFactories are documented at all. Some of them have a simple XML config snippet example. Take the StandardTokenFilterFactory. It had no class JavaDoc until two days ago, when Koji and yourself added an xml snippet.

But should the documentation be on the Factory or on the Filter? The WordDelimiterFilterFactory is not documented but the Filter itself is (although it is not correctly HTML formatted so it looks broken in the browser).

I think a reasonable goal, at least for these plugin type of classes, is to use the JavaDoc as the official main doc and point from Wiki to there. But then the Class-level JavaDoc must give a short introduction to what the filter does, when it is typically used along with a list of all valid parameters and their values.

Jan Høydahl
added a comment - 25/Feb/11 13:44 Few of the TokenFilterFactories are documented at all. Some of them have a simple XML config snippet example. Take the StandardTokenFilterFactory. It had no class JavaDoc until two days ago, when Koji and yourself added an xml snippet.
But should the documentation be on the Factory or on the Filter? The WordDelimiterFilterFactory is not documented but the Filter itself is (although it is not correctly HTML formatted so it looks broken in the browser).
I think a reasonable goal, at least for these plugin type of classes, is to use the JavaDoc as the official main doc and point from Wiki to there. But then the Class-level JavaDoc must give a short introduction to what the filter does, when it is typically used along with a list of all valid parameters and their values.

Few of the TokenFilterFactories are documented at all. Some of them have a simple XML config snippet example. Take the StandardTokenFilterFactory. It had no class JavaDoc until two days ago, when Koji and yourself added an xml snippet.

Thats not really correct: typically they have a link to the tokenfilter, for example here is ThaiWordFilterFactory.
(Factory for

{@link ThaiWordFilter}

). if they take arguments, then typically they describe what those arguments do.

This is enough, because someone can click the ThaiWordFilter and get all the details there.

The javadocs for the factory need only document the factory.

I think a reasonable goal, at least for these plugin type of classes, is to use the JavaDoc as the official main doc and point from Wiki to there. But then the Class-level JavaDoc must give a short introduction to what the filter does, when it is typically used along with a list of all valid parameters and their values.

I really don't think we should duplicate documentation from any Tokenizers/Filters into the factories. The factory should just have a javadoc ref to what it produces, and explain its various parameters. In other words, it need only document itself.

Any other documentation is actually redundant and problematic, as long as this javadoc exists it increases the maintenance load around here with no benefits to the user at all.

Robert Muir
added a comment - 25/Feb/11 14:01
Few of the TokenFilterFactories are documented at all. Some of them have a simple XML config snippet example. Take the StandardTokenFilterFactory. It had no class JavaDoc until two days ago, when Koji and yourself added an xml snippet.
Thats not really correct: typically they have a link to the tokenfilter, for example here is ThaiWordFilterFactory.
(Factory for
{@link ThaiWordFilter}
). if they take arguments, then typically they describe what those arguments do.
This is enough, because someone can click the ThaiWordFilter and get all the details there.
The javadocs for the factory need only document the factory.
I think a reasonable goal, at least for these plugin type of classes, is to use the JavaDoc as the official main doc and point from Wiki to there. But then the Class-level JavaDoc must give a short introduction to what the filter does, when it is typically used along with a list of all valid parameters and their values.
I really don't think we should duplicate documentation from any Tokenizers/Filters into the factories. The factory should just have a javadoc ref to what it produces, and explain its various parameters. In other words, it need only document itself.
Any other documentation is actually redundant and problematic, as long as this javadoc exists it increases the maintenance load around here with no benefits to the user at all.

Jan Høydahl
added a comment - 25/Feb/11 14:25 About where to document, that was a question. Linking from Factory to Filter is a good practice.
Looks like there is a lot of JavaDoc improvements with 3.1, so once that's out the door, it should be possible to rework and slim down the analysis wiki page quite much.

I agree, if you find factories in trunk/branch_3x that do not @link the filter/tokenizer they create, and don't describe how to use the factory (e.g. set parameters), I think we should fix those.

I did a quick check for the former, and I think all factories link to the filter/tokenizer they create.

In general I think its best if any parameters/options are described in the filters themselves too, so that lucene users see this documentation, and so we can very verbosely describe what these parameters do all in one place (to reduce confusion).

Robert Muir
added a comment - 25/Feb/11 14:40 I agree, if you find factories in trunk/branch_3x that do not @link the filter/tokenizer they create, and don't describe how to use the factory (e.g. set parameters), I think we should fix those.
I did a quick check for the former, and I think all factories link to the filter/tokenizer they create.
In general I think its best if any parameters/options are described in the filters themselves too, so that lucene users see this documentation, and so we can very verbosely describe what these parameters do all in one place (to reduce confusion).
Then any factories can simply link to the original documentation for the parameter values, too. Here's an example: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/spelling/DirectSolrSpellChecker.java

I realize such things are a revolutionary refactoring, but in the light of this perhaps it would be beneficial to switch from JavaDoc to Google's Doclava. Among other things it includes embedding real-code snippets, so you get a consistent javadoc.

Dawid Weiss
added a comment - 26/Feb/11 08:34 I realize such things are a revolutionary refactoring, but in the light of this perhaps it would be beneficial to switch from JavaDoc to Google's Doclava. Among other things it includes embedding real-code snippets, so you get a consistent javadoc.
Note: I haven't used it yet, I just know it supports such things, see @sample tag here:
http://code.google.com/p/doclava/wiki/JavadocTags

Yonik Seeley
added a comment - 26/Feb/11 15:34 ASIDE: And what about finally making the move to the Confluence Wiki as well
Unfortunately, it looks like confluence is going away at the ASF:
http://www.apache.org/dev/cms.html
score 1 for procrastination

Mark Miller
added a comment - 26/Feb/11 15:43 That almost looks like it could be going away for project main website stuff ... but that is separate from the wiki is it not?
It seems like we would stick to real wiki software for the wiki portion of the site at Apache?
In which case, I think Confluence is nice improvement over MoinMoin.
Our website and wiki look ancient as another aside.

Yonik Seeley
added a comment - 26/Feb/11 15:50 Re-reading the link, I think you're right - it's only support for confluence backed sites that is being phased out.
I've long been in favor of a move to confluence, but no real time to do it myself.

this renewed interest in adding more focus and attention to the javadocs as user visible documentation has be thinking that maybe the time has come to revive SOLR-555.

I lost steam on it back in the day because I was having trouble drumming up interest from other people to help get the javadocs of all the various plugin instances to the state where the output would be useful for non java users (most people seemed content to just use the wiki) and it seemed better to ship no docs then ship bad docs.

Hoss Man
added a comment - 28/Feb/11 00:20 this renewed interest in adding more focus and attention to the javadocs as user visible documentation has be thinking that maybe the time has come to revive SOLR-555 .
I lost steam on it back in the day because I was having trouble drumming up interest from other people to help get the javadocs of all the various plugin instances to the state where the output would be useful for non java users (most people seemed content to just use the wiki) and it seemed better to ship no docs then ship bad docs.