Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies.

Details

Description

The DisMaxQParserPlugin has a variety of nice features; chief among them is that is uses the DisjunctionMaxQueryParser. However it imposes limitations on the syntax.

I've enhanced the DisMax QParser plugin to use a pluggable query string re-writer (via subclass extension) instead of hard-coding the logic currently embedded within it (i.e. the escape nearly everything logic). Additionally, I've made this QParser have a notion of a "simple" syntax (the default) or non-simple in which case some of the logic in this QParser doesn't occur because it's irrelevant (phrase boosting and min-should-max in particular). As part of my work I significantly moved the code around to make it clearer and more extensible. I also chose to rename it to suggest it's role as a parser for user queries.

Activity

I am contributing source files to this issue instead of patches because the code was significantly reworked.
Note that this patch depends strongly on SOLR-756 and mildly on SOLR-757 which I've contributed separately. They need to be applied for this to compile. Even if you don't get those patches, you can read the source any way to see what it does.

David Smiley
added a comment - 05/Sep/08 20:56 I am contributing source files to this issue instead of patches because the code was significantly reworked.
Note that this patch depends strongly on SOLR-756 and mildly on SOLR-757 which I've contributed separately. They need to be applied for this to compile. Even if you don't get those patches, you can read the source any way to see what it does.

First of all thanks for providing wildcard matching for the dismax query handler, that is exactly what I need. However, the WILDCARD_STRIP_CHARS regex in UserQParser.java does not work with umlauts which makes the patch useless for languages like ie. German.

I will attach a diff file with the changes I have made to get it working with umlauts.

Simon Lachinger
added a comment - 18/Sep/09 11:06 First of all thanks for providing wildcard matching for the dismax query handler, that is exactly what I need. However, the WILDCARD_STRIP_CHARS regex in UserQParser.java does not work with umlauts which makes the patch useless for languages like ie. German.
I will attach a diff file with the changes I have made to get it working with umlauts.

If the use-case is unrestricted Lucene syntax w/ dismax then Enhanced Dismax is the way to go. What I'm shooting for in this issue is a more extensible query parser. E-Dismax is cool but it doesn't look particularly extensible.

For example, in an app I support, I use this patch to do several things:
1. check if appears to be using fancy Lucene syntax and if so then treat as such.. but with dismax of course on non-fielded clauses via SOLR-756
2. If one clause then rewrite query to: clause clause^0.5 – i.e. search for clause and also include partial matches. For a small index I have this is fine but I can use n-gram some day if I need to.
3. If multiple clauses then rewrite query to: clauseA clauseB clauseC clauseC*^0.5 (clauseC is last clause).

What I'm hoping for is for Solr to offer better query parsing infrastructure so that I can implement my parsing needs by re-using/plugging into as much as already exists as possible. Committing SOLR-756 is one step there... but then there's some useful capabilty in DismaxQParser like boost queries, boost functions, q.alt. min-should-match is relatively re-usable since it stands alone.

David Smiley
added a comment - 17/Jan/10 05:05 If the use-case is unrestricted Lucene syntax w/ dismax then Enhanced Dismax is the way to go. What I'm shooting for in this issue is a more extensible query parser. E-Dismax is cool but it doesn't look particularly extensible.
For example, in an app I support, I use this patch to do several things:
1. check if appears to be using fancy Lucene syntax and if so then treat as such.. but with dismax of course on non-fielded clauses via SOLR-756
2. If one clause then rewrite query to: clause clause ^0.5 – i.e. search for clause and also include partial matches. For a small index I have this is fine but I can use n-gram some day if I need to.
3. If multiple clauses then rewrite query to: clauseA clauseB clauseC clauseC*^0.5 (clauseC is last clause).
What I'm hoping for is for Solr to offer better query parsing infrastructure so that I can implement my parsing needs by re-using/plugging into as much as already exists as possible. Committing SOLR-756 is one step there... but then there's some useful capabilty in DismaxQParser like boost queries, boost functions, q.alt. min-should-match is relatively re-usable since it stands alone.

Hoss Man
added a comment - 27/May/10 22:09 Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...
http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.
A unique token for finding these 240 issues in the future: hossversioncleanup20100527

Hoss Man
added a comment - 21/Mar/12 18:08 Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.
email notification suppressed to prevent mass-spam
psuedo-unique token identifying these issues: hoss20120321nofix36