the 'experimental' label is a flag to say that the behavior will likely change in the future – since back compatibility is taken so seriously, this allows a way to add features before they are 100% cooked.

Ryan McKinley
added a comment - 17/Feb/11 02:24 the 'experimental' label is a flag to say that the behavior will likely change in the future – since back compatibility is taken so seriously, this allows a way to add features before they are 100% cooked.

I'm confused about why this cool query parser I've been using is "experimental"

because some of it's current "default" behavior is less then ideal, particularly for people migrating from dismax (ie: see comments about making field queries configurable) and in a few cases even "broken" compared to how it worked when the patch was initially commited (see recent comments about foo:bar when foo is not a field)

in general, marking it experimental is a way to allow us to leave it in the 3.1 release but still have the flexibility to modify the "default" behavior moving forward.

Hoss Man
added a comment - 16/Feb/11 23:52 I'm confused about why this cool query parser I've been using is "experimental"
because some of it's current "default" behavior is less then ideal, particularly for people migrating from dismax (ie: see comments about making field queries configurable) and in a few cases even "broken" compared to how it worked when the patch was initially commited (see recent comments about foo:bar when foo is not a field)
in general, marking it experimental is a way to allow us to leave it in the 3.1 release but still have the flexibility to modify the "default" behavior moving forward.

I'm confused about why this cool query parser I've been using is "experimental". Sure, there are opportunities for improvement, but it's already better than the original dismax which this makes obsolete. No?

David Smiley
added a comment - 16/Feb/11 22:38 I'm confused about why this cool query parser I've been using is "experimental". Sure, there are opportunities for improvement, but it's already better than the original dismax which this makes obsolete. No?

Hoss Man
added a comment - 16/Feb/11 02:52 I'll keep the issue open in 3.1 for a few more days as discussed, then i'm moving it out.
it would be less confusing to just resolve it as fixed, and open new issues to track the outstanding problems/bugs/questions.

The innards of how this works it totally greek, but tried finding somethign to fix hoss' unescaped patch. It seems that the root of the problem is that QueryParserBase.parse( String ) will return a BooleanQuery with no clauses for the invalid field query.

Then the edismax just checks if the parsedQuery is null to see if it is valid.

I tried just returning null from the QueryParserBase, but that (not surprisingly) breaks other tests like TestMultiFieldQueryParser. I imagine somethign changed here for why it used to work, and now "mysteriously" does not.

Ryan McKinley
added a comment - 14/Feb/11 19:42 +1 to mark as experimental in 3.1
The innards of how this works it totally greek, but tried finding somethign to fix hoss' unescaped patch. It seems that the root of the problem is that QueryParserBase.parse( String ) will return a BooleanQuery with no clauses for the invalid field query.
Query res = TopLevelQuery(field);
return res!= null ? res : newBooleanQuery( false );
Then the edismax just checks if the parsedQuery is null to see if it is valid.
I tried just returning null from the QueryParserBase, but that (not surprisingly) breaks other tests like TestMultiFieldQueryParser. I imagine somethign changed here for why it used to work, and now "mysteriously" does not.
Adding a check for empty BooleanQuery fixes this in edismax though:
if ( parsedUserQuery instanceof BooleanQuery ) {
if ( ((BooleanQuery)parsedUserQuery).getClauses().length < 1 ) {
parsedUserQuery = null ;
}
}
All tests pass... but can someone who knows what the ramifications of this change means take a look?

Robert Muir
added a comment - 13/Feb/11 10:23 Hi, can we just mark this stuff experimental for 3.1?
If nobody takes action on this issue within 7 days,
I plan to add the text to README.txt and move the issue out of 3.1.

Yonik Seeley
added a comment - 26/Oct/10 20:19 Users do not expect the first query to remove all instances of "foo" from their results. " " meaning "forbidden" should only take effect when the " " is followed by a non-whitespace character.
Graham, this has just been fixed in trunk: LUCENE-2566

Gunnar: this is precisely what i was attempting with the edismax.userFields.patch file but it doesn't work at the moment, probably because of the problem i identified in edismax.unescapedcolon.bug.test.patch (see previous comments, particularly discussion with Jonathan.

If you've got some time/energy to dig into those patches and help get them working that would be great

Hoss Man
added a comment - 15/Oct/10 22:33 Gunnar: this is precisely what i was attempting with the edismax.userFields.patch file but it doesn't work at the moment, probably because of the problem i identified in edismax.unescapedcolon.bug.test.patch (see previous comments, particularly discussion with Jonathan.
If you've got some time/energy to dig into those patches and help get them working that would be great

I was wonder if it would be possible to add support for explicitly limiting the allowed fields? For example, I'd like to be able to not allow queries in internal fields but only in a defined set of "public" fields (eg. GMail recognizes "to", "subject", etc. but really only those).

Another nice feature would be alias definitions for fields. This would allow to support localized field names in queries which translate to the real field name. (eg. "an" -> "to", "Betreff" -> "subject", "betreff" -> "subject").

Gunnar Wagenknecht
added a comment - 15/Oct/10 07:36 - edited Re: field-based queries (field:value)
I was wonder if it would be possible to add support for explicitly limiting the allowed fields? For example, I'd like to be able to not allow queries in internal fields but only in a defined set of "public" fields (eg. GMail recognizes "to", "subject", etc. but really only those).
Another nice feature would be alias definitions for fields. This would allow to support localized field names in queries which translate to the real field name. (eg. "an" -> "to", "Betreff" -> "subject", "betreff" -> "subject").
Thoughts?

To handle the upper/lower case problem (which extends to character set folding), text and string fields could specify a CharFilter for indexing. Which promptly interacts with faceting because facets come from indexed values, not stored values.

Lance Norskog
added a comment - 09/Sep/10 21:34 Does edismax handle wildcards, and should it?
To handle the upper/lower case problem (which extends to character set folding), text and string fields could specify a CharFilter for indexing. Which promptly interacts with faceting because facets come from indexed values, not stored values.

Ron Mayer
added a comment - 19/Aug/10 20:47 I very much like edismax - The "pf2" parameter in particular is doing wonders for getting my most relevant documents to the very top of the list in one of my apps.

Yonik Seeley
added a comment - 26/Jul/10 15:45 Interesting... edismax does currently treat (bar - foo) like (bar -foo).
I didn't code that, so it must be the Lucene query parser.
I wonder if it has done that forever, and if it was intended?

Please ensure that the the edismax does not have old DisMax bug where a query for

bar - foo

gets treated like

bar -foo

Users do not expect the first query to remove all instances of "foo" from their results. "-" meaning "forbidden" should only take effect when the "-" is followed by a non-whitespace character. Similarly for +. Same problem as SOLR-490, but I doubt the necessity of interpreting the original query as a phrase query "bar - foo" in order to fix the problem.

Graham P
added a comment - 26/Jul/10 10:36 - edited Please ensure that the the edismax does not have old DisMax bug where a query for
bar - foo
gets treated like
bar -foo
Users do not expect the first query to remove all instances of "foo" from their results. "-" meaning "forbidden" should only take effect when the "-" is followed by a non-whitespace character. Similarly for +. Same problem as SOLR-490 , but I doubt the necessity of interpreting the original query as a phrase query "bar - foo" in order to fix the problem.

Hoss Man
added a comment - 28/May/10 00:10
Correcting Fix Version based on CHANGES.txt, see this thread for more details...
http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

Hoss Man
added a comment - 27/May/10 23:04 Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...
http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.
A unique token for finding these 240 issues in the future: hossversioncleanup20100527

Hoss Man
added a comment - 25/May/10 17:58 Another bug noticed on the user list, edismax doesn't seem to respect MM all the time – in particular when there are negated clauses...
Compare...
http://localhost:8983/solr/select?debugQuery=true&defType=dismax&qf=text&q=xxx+yyy+zzz+-1234&mm=2
http://localhost:8983/solr/select?debugQuery=true&defType=edismax&qf=text&q=xxx+yyy+zzz+-1234&mm=2

Jonathan: looking at the code, it seems completely plausible, ant that's the direction i was going down with that previous patch – but i got hung up on the fact that for reasons i couldn't identify, clauses refering to fieldnames that don't exist in the schema are getting dropped – need to track down where that is happening and stop it, so the new code can look at those "field names" and treat them as aliases (to resolve to other fields)

I just need to find time to dig into it more – but if you want to take a swing at fixing edismax.unescapedcolon.bug.test.patch and then improving on edismax.userFields.patch, by all means be my guest.

Hoss Man
added a comment - 05/Apr/10 18:28 Jonathan: looking at the code, it seems completely plausible, ant that's the direction i was going down with that previous patch – but i got hung up on the fact that for reasons i couldn't identify, clauses refering to fieldnames that don't exist in the schema are getting dropped – need to track down where that is happening and stop it, so the new code can look at those "field names" and treat them as aliases (to resolve to other fields)
I just need to find time to dig into it more – but if you want to take a swing at fixing edismax.unescapedcolon.bug.test.patch and then improving on edismax.userFields.patch, by all means be my guest.

Hoss, I would be EXTREMELY interested in the "uf" ability to specify only certain fields that may be user-specified (with other field-spec-looking things just escaped, I guess?), and also in another comment you made earlier:

"one really nice thing about the field query support though: it looks like it would really be easy to add support for arbitrary field name aliasing with something like f.someFieldAlias.qf=realFieldA^3+realFieldB^4"

That would be amazing, and make eDismax meet all my needs. Are you still thinking of that, and does your work with the code make it seem any more or less plausible?

Jonathan Rochkind
added a comment - 31/Mar/10 03:06 Hoss, I would be EXTREMELY interested in the "uf" ability to specify only certain fields that may be user-specified (with other field-spec-looking things just escaped, I guess?), and also in another comment you made earlier:
"one really nice thing about the field query support though: it looks like it would really be easy to add support for arbitrary field name aliasing with something like f.someFieldAlias.qf=realFieldA^3+realFieldB^4"
That would be amazing, and make eDismax meet all my needs. Are you still thinking of that, and does your work with the code make it seem any more or less plausible?

Yonik Seeley
added a comment - 18/Mar/10 03:00 Hmmm, the intention was to try and detect when a ':' meant a fielded query, or if it was a literal.
So IIRC, it does look like a bug if it's not working.
A "uf" param sounds good.

Hoss Man
added a comment - 01/Mar/10 23:21 What does "u" in "uf" stand for?
"user fields" ... as in "field names a user may refer to" ... but it's not something i though through to hard, as i said: work in progress.

FWIW: initial steps towards adding a "uf" param to let users specify what field names can be specified explicitly in the query string, with optional default boosts to apply to those clauses ... not finished.

Hoss Man
added a comment - 01/Mar/10 22:33 FWIW: initial steps towards adding a "uf" param to let users specify what field names can be specified explicitly in the query string, with optional default boosts to apply to those clauses ... not finished.

On the train this past weekend i started trying to tackle the issue of making support for field based queries (ie: "fieldA:valueB") configurable so that it could be turned on/off for certain fields (or left off completely for back-compat with dismax)

Based on yonik's description of edismax, and my initial reading of the code (particularly the use of clause.field and getFieldName in ExtendedDismaxQParser) i was under the impression that if a "clause" consisting of FOO:BAR was encountered, and FOO was not a known field, that the clause would be treated as a literal, and the colon would be escaped before passing it on to ExtendedSolrQueryParser ... essentially that FOO:BAR and FOO\:BAR would be equivalent if FOO is not the name of a real field according to the IndexSchema.

For reasons I don't fully understand yet, this isn't the case – as the attached test shows, the queries are parsed differently, and (evidently) FOO:BAR is parsed as an empty query if FOO is not a real field.

Before I try digging into this too much, I wanted to sanity check:

is this expected? ... was this done intentionally?

is this desired? ... is this logical default behavior to have if the field isn't defined? should we have tests to assert this before i start adding more config options to change the behavior?

Hoss Man
added a comment - 01/Mar/10 22:27 On the train this past weekend i started trying to tackle the issue of making support for field based queries (ie: "fieldA:valueB") configurable so that it could be turned on/off for certain fields (or left off completely for back-compat with dismax)
Based on yonik's description of edismax, and my initial reading of the code (particularly the use of clause.field and getFieldName in ExtendedDismaxQParser) i was under the impression that if a "clause" consisting of FOO:BAR was encountered, and FOO was not a known field, that the clause would be treated as a literal, and the colon would be escaped before passing it on to ExtendedSolrQueryParser ... essentially that FOO:BAR and FOO\:BAR would be equivalent if FOO is not the name of a real field according to the IndexSchema.
For reasons I don't fully understand yet, this isn't the case – as the attached test shows, the queries are parsed differently, and (evidently) FOO:BAR is parsed as an empty query if FOO is not a real field.
Before I try digging into this too much, I wanted to sanity check:
is this expected? ... was this done intentionally?
is this desired? ... is this logical default behavior to have if the field isn't defined? should we have tests to assert this before i start adding more config options to change the behavior?

Committed revision 901342.
...
this was the same as my SOLR-1553.pf-refactor.patch with the one addition of restoring the use of DisjunctionMaxQuery for the pf* params (per yonik's comment that he couldn't remember why he changed it)

Hoss Man
added a comment - 20/Jan/10 19:26 Committed revision 901342.
...
this was the same as my SOLR-1553 .pf-refactor.patch with the one addition of restoring the use of DisjunctionMaxQuery for the pf* params (per yonik's comment that he couldn't remember why he changed it)
if we figure out his reason (i'm sure he had one) we can re-evaluate.

Yonik (or someone else I guess), would you mind commenting on this claim?:
"Improved proximity boosting via word bigrams... this prevents the problem of needing 100% of the words in the document to get any boost, as well as having all of the words in a single field."
I looked through the source and saw the bigram addition using "SHOULD"... cool... but I am unaware of the two problems you speak of that this addresses.

David Smiley
added a comment - 16/Jan/10 06:20 Yonik (or someone else I guess), would you mind commenting on this claim?:
"Improved proximity boosting via word bigrams... this prevents the problem of needing 100% of the words in the document to get any boost, as well as having all of the words in a single field."
I looked through the source and saw the bigram addition using "SHOULD"... cool... but I am unaware of the two problems you speak of that this addresses.
By the way... this thing is slick. You rock.

Hoss Man
added a comment - 01/Jan/10 08:55 incremental patch (against trunk) that i wrote on the plane a few days ago...
refactors the pf logic and adds comments: pf is restored to it's original meaning in dismax, pf3 does what it did in edismax, and pf2 is added to do what edismax was previously doing with pf.
clear as mud right? ... fortunately the code is much cleaner then the explanation.

why is "TO" listed as an operator when building up the phrase boost fields? (line 296) ... if range queries are supported, then shouldn't the upper/lower bounds also be striped out of the clauses list?

It seemed incrementally better than leaving it out.
Range queries aren't actually supported in the fallback mode. The phrase boosting is also relatively rudimentary. We could try to recognize range queries and removing the whole thing.

i see that the boost queries built from the pf and pf3 fields are put in BooleanQueries instead of DisjunctionMaxQueries ... but why?

I honestly don't recall (this was written a while ago).

ExtendedAnalyzer feels like a really big hack ... i'm not certain, but i don't think it works correctly if a CharFilter is declared.

Probably not - CharFilter didn't exist when this was written.
Should we use an alternative method for signaling that stopwords are optional? The nice thing about putting the stopword filter in the query analyzer (and having edismax detect it) is that something like the lucene query parser works as it did in the past... stopwords are removed. But I could see the other side of the argument too.

Anyway, I think I agree with pretty much everything you say - just haven't had time to do anything about it.

Yonik Seeley
added a comment - 22/Dec/09 22:20 why is "TO" listed as an operator when building up the phrase boost fields? (line 296) ... if range queries are supported, then shouldn't the upper/lower bounds also be striped out of the clauses list?
It seemed incrementally better than leaving it out.
Range queries aren't actually supported in the fallback mode. The phrase boosting is also relatively rudimentary. We could try to recognize range queries and removing the whole thing.
i see that the boost queries built from the pf and pf3 fields are put in BooleanQueries instead of DisjunctionMaxQueries ... but why?
I honestly don't recall (this was written a while ago).
ExtendedAnalyzer feels like a really big hack ... i'm not certain, but i don't think it works correctly if a CharFilter is declared.
Probably not - CharFilter didn't exist when this was written.
Should we use an alternative method for signaling that stopwords are optional? The nice thing about putting the stopword filter in the query analyzer (and having edismax detect it) is that something like the lucene query parser works as it did in the past... stopwords are removed. But I could see the other side of the argument too.
Anyway, I think I agree with pretty much everything you say - just haven't had time to do anything about it.

the code is kind of hard to read ... there's a serious dirth of comments

reads very kludgy, clearly a hacked up version of DisMax ... probably want to refactor some helper functions (that can then be documented)

the clause.field and getFieldName functionality is dangerous for people migrating from edismax->dismax (users guessing field names can query on fields the solr admin doesn't want them to query on) ... we need an option to turn that off.

one really nice thing about the field query support though: it looks like it would really be easy to add support for arbitrary field name aliasing with something like f.someFieldAlias.qf=realFieldA^3+realFieldB^4

perhaps getFieldName should only work for fields explicitly enumerated in a param?

why is "TO" listed as an operator when building up the phrase boost fields? (line 296) ... if range queries are supported, then shouldn't the upper/lower bounds also be striped out of the clauses list?

accepting range queries also seems like something that people should be able to disable

apparently "pf" was changed to iteratively build boosting phrase queries for every 'pair' of words, and "pf3" is a new param to build boosting phrase queries for every 'triple' of words in the input. while this certainly seems useful, it's not back-compatable .. why not restore 'pf' to it's original purpose, and add "pf2" for hte pairs?

what is the motivation for ExtendedSolrQueryParser.makeDismax? ... i see that the boost queries built from the pf and pf3 fields are put in BooleanQueries instead of DisjunctionMaxQueries ... but why? (if the user searches for a phrase that's common in many fields of one document, that document is going to get a huge score boost regardless of the "tie" value, which kind of defeats the point of what the dismax parser is trying to do)

we should remove the extremely legacy "/* legacy logic */" for dealing with "bq" ... almost no one should care about that, we really don't need to carry it forward in a new parser.

there are a lot of empty catch blocks that seem like they should at least log a warning or debug message.

ExtendedAnalyzer feels like a really big hack ... i'm not certain, but i don't think it works correctly if a CharFilter is declared.

we need to document all these new params ("pf3", "lowercaseOperators", "boost",

Thoughts while testing it out on some really hairy edge cases that break the old dismax parser...

this is really cool

this is really freaking cool.

still has a problem with search strings like "foo &&" and "foo ||" ... i suspect it would be an easy fix to recognize these just like AND/OR are recognized and escaped.

once we fix some of hte issues mentioned above, we should absolutely register this using the name "dismax" by default, and register the old one as "oldDismax" with a note in CHANGES.txt telling people to use defType=oldDismax if they really need it.

Hoss Man
added a comment - 07/Dec/09 18:08 Thoughts while reading the code...
the code is kind of hard to read ... there's a serious dirth of comments
reads very kludgy, clearly a hacked up version of DisMax ... probably want to refactor some helper functions (that can then be documented)
the clause.field and getFieldName functionality is dangerous for people migrating from edismax->dismax (users guessing field names can query on fields the solr admin doesn't want them to query on) ... we need an option to turn that off.
one really nice thing about the field query support though: it looks like it would really be easy to add support for arbitrary field name aliasing with something like f.someFieldAlias.qf=realFieldA^3+realFieldB^4
perhaps getFieldName should only work for fields explicitly enumerated in a param?
why is "TO" listed as an operator when building up the phrase boost fields? (line 296) ... if range queries are supported, then shouldn't the upper/lower bounds also be striped out of the clauses list?
accepting range queries also seems like something that people should be able to disable
apparently "pf" was changed to iteratively build boosting phrase queries for every 'pair' of words, and "pf3" is a new param to build boosting phrase queries for every 'triple' of words in the input. while this certainly seems useful, it's not back-compatable .. why not restore 'pf' to it's original purpose, and add "pf2" for hte pairs?
what is the motivation for ExtendedSolrQueryParser.makeDismax? ... i see that the boost queries built from the pf and pf3 fields are put in BooleanQueries instead of DisjunctionMaxQueries ... but why? (if the user searches for a phrase that's common in many fields of one document, that document is going to get a huge score boost regardless of the "tie" value, which kind of defeats the point of what the dismax parser is trying to do)
we should remove the extremely legacy "/* legacy logic */" for dealing with "bq" ... almost no one should care about that, we really don't need to carry it forward in a new parser.
there are a lot of empty catch blocks that seem like they should at least log a warning or debug message.
ExtendedAnalyzer feels like a really big hack ... i'm not certain, but i don't think it works correctly if a CharFilter is declared.
we need to document all these new params ("pf3", "lowercaseOperators", "boost",
Thoughts while testing it out on some really hairy edge cases that break the old dismax parser...
this is really cool
this is really freaking cool.
still has a problem with search strings like "foo &&" and "foo ||" ... i suspect it would be an easy fix to recognize these just like AND/OR are recognized and escaped.
once we fix some of hte issues mentioned above, we should absolutely register this using the name "dismax" by default, and register the old one as "oldDismax" with a note in CHANGES.txt telling people to use defType=oldDismax if they really need it.

You can't possibly have written it any more quickly then i did the original dismax parser ... i mean jusus: was so rushed i forgot to test "AND" in uppercase, and as a result didn't request the time i needed to do it all the right way.

i think we've all suffered from my poor initial implementation long enough.

Hoss Man
added a comment - 13/Nov/09 01:38 Errr, well... I wrote it pretty quickly.
You can't possibly have written it any more quickly then i did the original dismax parser ... i mean jusus: was so rushed i forgot to test "AND" in uppercase, and as a result didn't request the time i needed to do it all the right way.
i think we've all suffered from my poor initial implementation long enough.

Yonik Seeley
added a comment - 13/Nov/09 00:59 Assuming the new parser is well written (and i would expect nothing less from all you lucid folks)
Errr, well... I wrote it pretty quickly. Someone was working on a better query parser and this one was more of a "we need something now" type of thing.
But yes, it should be easy to turn off the extra features.

I haven't looked at the patch, and i hope to find some time to do so prior to year with a "3" in it, but in the meantime...

I guess one of the main questions is if this should remain separate from dismax, or if it should eventually replace dismax?

I think the right way to handle this is to let both classes coexist in the code base, but to start encouraging people to use the new/improved version by changing the default used when defType=dismax ... a note in CHANGES.txt that people who really want the old class should use defType=oldDismax, or explicitly declare the old parsing using the name "dismax" seems totally suitable (much as we did with LegacyDateField, and lucenePlusSort.

Having the capability to do field selections from the query string can often be undesirable, if not even a security issue in some cases.

Assuming the new parser is well written (and i would expect nothing less from all you lucid folks) i imagine it wouldn't take too much work to make all of those various options controllable by params. if nothing else, an "escapeTheseChars" type option would be nice brute force way to disable some stuff.

Hoss Man
added a comment - 13/Nov/09 00:00 I haven't looked at the patch, and i hope to find some time to do so prior to year with a "3" in it, but in the meantime...
I guess one of the main questions is if this should remain separate from dismax, or if it should eventually replace dismax?
I think the right way to handle this is to let both classes coexist in the code base, but to start encouraging people to use the new/improved version by changing the default used when defType=dismax ... a note in CHANGES.txt that people who really want the old class should use defType=oldDismax, or explicitly declare the old parsing using the name "dismax" seems totally suitable (much as we did with LegacyDateField, and lucenePlusSort.
Having the capability to do field selections from the query string can often be undesirable, if not even a security issue in some cases.
Assuming the new parser is well written (and i would expect nothing less from all you lucid folks) i imagine it wouldn't take too much work to make all of those various options controllable by params. if nothing else, an "escapeTheseChars" type option would be nice brute force way to disable some stuff.

Erik Hatcher
added a comment - 12/Nov/09 14:01 Personally I think this should live independently from dismax. Having the capability to do field selections from the query string can often be undesirable, if not even a security issue in some cases.
+1 to keeping them separate, or at least controllable whether the field selection and other crazy query syntax is available or not.

Otis Gospodnetic
added a comment - 11/Nov/09 21:14 I think you need to click on "Issue Links" link, delete, and re-link.
I have a feeling once this is in, people won't need the original dismax.
Yonik, did you mean to attach a patch, but forgot?

I guess one of the main questions is if this should remain separate from dismax, or if it should eventually replace dismax? If the latter, they could either co-exist for some time and eventually replace, or remain uncommitted before it can replace...

Incompatibilities off the top of my head:

"pf" creates word bigrams for proximity boosting... we could perhaps change this to "pf2" and add in support for the dismax-style pf

fielded queries - dismax doesn't have them, and they may not be desired by all clients

full lucene syntax - same as above - may not be desired by all clients

I think the other incompatibilities would mostly fall in the category of bugs, or things that weren't well specified or well behaved anyway (e.g. I don't really see a downside to allowing nested pure negative queries to work)

Yonik Seeley
added a comment - 11/Nov/09 16:04 I guess one of the main questions is if this should remain separate from dismax, or if it should eventually replace dismax? If the latter, they could either co-exist for some time and eventually replace, or remain uncommitted before it can replace...
Incompatibilities off the top of my head:
"pf" creates word bigrams for proximity boosting... we could perhaps change this to "pf2" and add in support for the dismax-style pf
fielded queries - dismax doesn't have them, and they may not be desired by all clients
full lucene syntax - same as above - may not be desired by all clients
I think the other incompatibilities would mostly fall in the category of bugs, or things that weren't well specified or well behaved anyway (e.g. I don't really see a downside to allowing nested pure negative queries to work)

Lucid Imagination is contributing back this query parser currently being used at search.lucidimagination.com

The extended dismax parser was based on the original Solr dismax parser.

Supports full lucene query syntax in the absence of syntax errors

supports "and"/"or" to mean "AND"/"OR" in lucene syntax mode

When there are syntax errors, improved smart partial escaping of special characters is done to prevent them... in this mode, fielded queries, +/-, and phrase queries are still supported.

Improved proximity boosting via word bigrams... this prevents the problem of needing 100% of the words in the document to get any boost, as well as having all of the words in a single field.

advanced stopword handling... stopwords are not required in the mandatory part of the query but are still used (if indexed) in the proximity boosting part. If a query consists of all stopwords (e.g. to be or not to be) then all will be required.

Supports the "boost" parameter.. like the dismax bf param, but multiplies the function query instead of adding it in

Supports pure negative nested queries... so a query like +foo (-foo) will match all documents

Some examples of queries that currently give dismax fits that now work with extended dismax:
OR
AND
NOT
-
"

Yonik Seeley
added a comment - 10/Nov/09 22:47 Lucid Imagination is contributing back this query parser currently being used at search.lucidimagination.com
The extended dismax parser was based on the original Solr dismax parser.
Supports full lucene query syntax in the absence of syntax errors
supports "and"/"or" to mean "AND"/"OR" in lucene syntax mode
When there are syntax errors, improved smart partial escaping of special characters is done to prevent them... in this mode, fielded queries, +/-, and phrase queries are still supported.
Improved proximity boosting via word bigrams... this prevents the problem of needing 100% of the words in the document to get any boost, as well as having all of the words in a single field.
advanced stopword handling... stopwords are not required in the mandatory part of the query but are still used (if indexed) in the proximity boosting part. If a query consists of all stopwords (e.g. to be or not to be) then all will be required.
Supports the "boost" parameter.. like the dismax bf param, but multiplies the function query instead of adding it in
Supports pure negative nested queries... so a query like +foo (-foo) will match all documents
Some examples of queries that currently give dismax fits that now work with extended dismax:
OR
AND
NOT
-
"
Example queries:
http://localhost:8983/solr/select?defType=edismax&q=big+blue+house&pf=text&qf=text&debugQuery=true
+(((text:big) (text:blue) (text:hous))~3) ((text:"big blue") (text:"blue hous"))
http://localhost:8983/solr/select?defType=edismax&q=hello&pf=text&qf=text&boost=popularity&debugQuery=true
boost(+(text:hello),int(popularity))
And if the text field were configured with the stopfilter only on the query analyzer, then
http://localhost:8983/solr/select?defType=edismax&q=this+is+the+end&pf=text&qf=text&debugQuery=true
+(((text:end))~1) ((text:"this is") (text:"is the") (text:"the end"))
http://localhost:8983/solr/select?defType=edismax&q=how+now+ "brown+cow"+popularity:100&pf=text&qf=text&debugQuery=true
+(((text:how) (text:now) (text:"brown cow") popularity:100)~4) (text:"how now")