Details

Description

I've come up with a use for DisjunctionMaxQuery, but not the dismax parser. I note that the toString() method on that item proposes a syntax with vertical bars. Is there any sympathy for a patch that added this to the standard parser, or some other syntax?

But I like your idea of adding it to the xml one first. Maybe its an easier iteration to then boil down the best
syntax for the other ones (which is going to be more difficult just because they are hairier).

Robert Muir
added a comment - 22/Apr/12 19:57 I'm +1 to adding syntax to this to the classic/flexible/qps as well.
But I like your idea of adding it to the xml one first. Maybe its an easier iteration to then boil down the best
syntax for the other ones (which is going to be more difficult just because they are hairier).

Benson Margulies
added a comment - 22/Apr/12 20:11 Indeed I don't quite know what sort of syntax would fly for the tieBreaker, I'm off to study the jj file for the classic parser and see if anything analogous presents itself.
Meanwhile, do you want to do XML in a different JIRA from the hard ones, or should I just stack up the patches here?

Using the syntax ( q | q | q ) might be doable in javacc, but I worry that it's undesirable.

Right now, what's in parens are boolean clauses (with +/-). The insides of a disjunct aren't boolean clauses, they are queries. This could be pretty confusing all around. It would be really better to introduce some syntax that allows for various sorts of grouping, but I don't want to step on the Solr parser's use of {}. Further, DisjunctionMaxQuery is just one thing, and using up | for it seems ill-advised.

% isn't doing anything for a living, so that an option would be

(%disjunctionmax q q q q q ) would serve, and also open a door to supporting other things.

Benson Margulies
added a comment - 22/Apr/12 20:33 Using the syntax ( q | q | q ) might be doable in javacc, but I worry that it's undesirable.
Right now, what's in parens are boolean clauses (with +/-). The insides of a disjunct aren't boolean clauses, they are queries. This could be pretty confusing all around. It would be really better to introduce some syntax that allows for various sorts of grouping, but I don't want to step on the Solr parser's use of {}. Further, DisjunctionMaxQuery is just one thing, and using up | for it seems ill-advised.
% isn't doing anything for a living, so that an option would be
(%disjunctionmax q q q q q ) would serve, and also open a door to supporting other things.
?

Robert Muir
added a comment - 22/Apr/12 20:36
Further, DisjunctionMaxQuery is just one thing, and using up | for it seems ill-advised.
I think we allow this for OR as well anyway, so it would be ambiguous...?

Robert Muir
added a comment - 22/Apr/12 20:56 Usually yes, however at this stage we just released 3.6 (intended to be the last minor release in the 3.x series).
So currently we have not yet cut a branch_4x for stable 4.0 and are only working on trunk.
(separately, we also have http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/ open for bugfixes, to backport any nasty bugs for 3.6.1, 3.6.2, etc: basically 3.x is intended to be in maintenance mode like 2.9.x was)

From where I sit, 4.0 looks a long way from a release. Lots of things with no javadoc, with the experimental tag, etc. (I've mostly been reading in Solr.) Am I underestimating the speed at which the entire lodge of beavers pulls a major release together?

Benson Margulies
added a comment - 22/Apr/12 21:01 At the risk of drifting off topic ...
From where I sit, 4.0 looks a long way from a release. Lots of things with no javadoc, with the experimental tag, etc. (I've mostly been reading in Solr.) Am I underestimating the speed at which the entire lodge of beavers pulls a major release together?

From the Solr side, I think most of the documentation tends to be higher level on the wiki, and javadocs
don't get as much attention since its a smaller population hacking on the APIs.

From the Lucene side, things tend to be under-documented or out of date.

In any case, just throw up patches and lets get em in. Nobody gets excited to work on docs too much around here unfortunately, probably because there is no momentum and the existing docs are all out of date and not very good.

On the other hand, just in the past weekend we've made some serious progress on the lucene documentation already: nuked the old out-of-date-forrest stuff, fixed a ton of broken links, added a broken-link checker, linked the javadocs of various modules to each other, brought all the existing docs (minus fileformats) up to speed with 4.0, generating htmlized version of the .txt documents with markdown, etc.

I know there is a lot of underdocumented stuff in lucene, but currently from my own perspective I am working to correct the broken stuff

For some reason, its definitely harder to fix the old documentation up to make sense than i would have ever thought, I spent most of the day just bringing http://lucene.apache.org/core/3_6_0/scoring.html up to speed and integrating it into the o.a.l.search package javadocs.

For the stuff you see thats experimental with no javadoc tag... this is really just as bad of a problem, just open issues if you want to help out. We are pretty overwhelmed with things to fix on the documentation side so any help would be appreciated.

Robert Muir
added a comment - 22/Apr/12 21:14 What can I say: documentation is always a weakness.
From the Solr side, I think most of the documentation tends to be higher level on the wiki, and javadocs
don't get as much attention since its a smaller population hacking on the APIs.
From the Lucene side, things tend to be under-documented or out of date.
In any case, just throw up patches and lets get em in. Nobody gets excited to work on docs too much around here unfortunately, probably because there is no momentum and the existing docs are all out of date and not very good.
On the other hand, just in the past weekend we've made some serious progress on the lucene documentation already: nuked the old out-of-date-forrest stuff, fixed a ton of broken links, added a broken-link checker, linked the javadocs of various modules to each other, brought all the existing docs (minus fileformats) up to speed with 4.0, generating htmlized version of the .txt documents with markdown, etc.
I know there is a lot of underdocumented stuff in lucene, but currently from my own perspective I am working to correct the broken stuff
For some reason, its definitely harder to fix the old documentation up to make sense than i would have ever thought, I spent most of the day just bringing http://lucene.apache.org/core/3_6_0/scoring.html up to speed and integrating it into the o.a.l.search package javadocs.
For the stuff you see thats experimental with no javadoc tag... this is really just as bad of a problem, just open issues if you want to help out. We are pretty overwhelmed with things to fix on the documentation side so any help would be appreciated.

Robert Muir
added a comment - 22/Apr/12 21:29 Committed revision 1328981 for the xml queryparser support. Thanks again.
We can either keep this issue open for the other QPs, or spin off new issues,
whichever you prefer.

Please see https://issues.apache.org/jira/browse/LUCENE-4012 for an alternative to adding syntax to any of the existing end-user-facing parsers. I think it makes more sense, myself, but if others see value in continuing the line in here I'm game.

Benson Margulies
added a comment - 22/Apr/12 23:27 - edited Please see https://issues.apache.org/jira/browse/LUCENE-4012 for an alternative to adding syntax to any of the existing end-user-facing parsers. I think it makes more sense, myself, but if others see value in continuing the line in here I'm game.