Ryan McKinley
added a comment - 02/Jul/07 00:33 This is a quick - poorly documented / no testing - implementation that mimics the current Standard query.
It is mostly intended to flush out design issues.
This works by sticking a 'ResponseBuilder' in the context and sharing that between each component.

I really like this modular approach to handling search requests, it will greatly simplify the process of adding new functionality (e.g. collapsing, faceting, more-like-this) to existing handlers without the need for unnecessary code replication. My primary goal is to extend the more-like-this handler capabilities and make them available to other handlers (such as dismax), and I think the proposed solution is a good approach.

Some issues that I can forsee though are:

1) Ordering: its fairly obvious that certain handlers need to be called before others (e.g. standard / dismax query parsing before faceting / highlighting) however there may be cases where the required sequence of events may be more subtle (e.g. faceting the results of a more-like-this query). There probably needs to be some mechanism to determine the order in which the components are prepared / processed.

2) Dependancy: a situation may arise where a component depends on operations performed by another component (e.g. more-like-this may take advantage of the dismax 'bq' parameter), perhaps there needs to be some method of specifying component dependency so that the SearchHandler can load and process required components automatically?

I hope this make sense, I'm fairly new to Solr development so I'm afraid my contributions to this issue would be mostly limited to (hopefully helpful) ideas and suggestions however I'm happy to tinker with the patched code from above and help test this new component framework as it is developed.

Pieter Berkel
added a comment - 11/Jul/07 07:19 I really like this modular approach to handling search requests, it will greatly simplify the process of adding new functionality (e.g. collapsing, faceting, more-like-this) to existing handlers without the need for unnecessary code replication. My primary goal is to extend the more-like-this handler capabilities and make them available to other handlers (such as dismax), and I think the proposed solution is a good approach.
Some issues that I can forsee though are:
1) Ordering: its fairly obvious that certain handlers need to be called before others (e.g. standard / dismax query parsing before faceting / highlighting) however there may be cases where the required sequence of events may be more subtle (e.g. faceting the results of a more-like-this query). There probably needs to be some mechanism to determine the order in which the components are prepared / processed.
2) Dependancy: a situation may arise where a component depends on operations performed by another component (e.g. more-like-this may take advantage of the dismax 'bq' parameter), perhaps there needs to be some method of specifying component dependency so that the SearchHandler can load and process required components automatically?
I hope this make sense, I'm fairly new to Solr development so I'm afraid my contributions to this issue would be mostly limited to (hopefully helpful) ideas and suggestions however I'm happy to tinker with the patched code from above and help test this new component framework as it is developed.
cheers,
Pieter

Pieter Berkel
added a comment - 17/Aug/07 01:49 I just tried this patch on svn trunk (r566899) and got the following failures:
$ patch -p0 < ../ SOLR-281 -SearchComponents.patch
...
patching file src/java/org/apache/solr/handler/StandardRequestHandler.java
Hunk #1 succeeded at 17 with fuzz 1.
Hunk #2 FAILED at 45.
1 out of 2 hunks FAILED – saving rejects to file src/java/org/apache/solr/handler/StandardRequestHandler.java.rej
...
patching file src/java/org/apache/solr/handler/DisMaxRequestHandler.java
Hunk #1 FAILED at 17.
1 out of 1 hunk FAILED – saving rejects to file src/java/org/apache/solr/handler/DisMaxRequestHandler.java.rej
I suspect it is the changes made by SOLR-326 that is causing the these problems, would it be possible for you to create a new patch?
thanks,
Piete

Ryan McKinley
added a comment - 08/Sep/07 23:01 Thanks Sharad! this is looking good.
I just updated it to work with trunk and cleaned up a few things.
made SearchComponent an abstract class rather then an interface. This will make adding features in the future easier.
made SearchComponent implement SolrInfoMBean
added lots of ASF headers and comment stubs
Yonik and/or Hoss... if you have time to check this out, that would be great.

It also looks like the additions to solrconfig.xml have not been included in the latest patch either. I was also going to suggest that it might be a good idea to support class shorthand notation, so org.apache.solr.handler.component.* can be written solr.component.* in solrconfig.xml.

Pieter Berkel
added a comment - 17/Sep/07 02:52 I'm having trouble applying the latest patch to trunk (r575809) again:
$ patch -p0 < ../ SOLR-281 -SearchComponents.patch
...
patching file src/java/org/apache/solr/handler/StandardRequestHandler.java
Hunk #1 FAILED at 17.
Hunk #2 FAILED at 45.
2 out of 2 hunks FAILED – saving rejects to file src/java/org/apache/solr/handler/StandardRequestHandler.java.rej
patching file src/java/org/apache/solr/handler/DisMaxRequestHandler.java
Hunk #2 FAILED at 118.
1 out of 2 hunks FAILED – saving rejects to file src/java/org/apache/solr/handler/DisMaxRequestHandler.java.rej
It also looks like the additions to solrconfig.xml have not been included in the latest patch either. I was also going to suggest that it might be a good idea to support class shorthand notation, so org.apache.solr.handler.component.* can be written solr.component.* in solrconfig.xml.

Re class aliases, it should work fine to put:
solr.component.*
If we add "org.apache.solr.handler.component." to the base list, you could just configure: org.apache.solr.handler.component.QueryComponent

DisMax request handler is now just the standard handler with defType=dismax added as a default

removed variable RequestBuilder class logic since it seems unnecessary... if two non-standard components want to communicate something, they can use the Context or the Response. (any reason I'm missing why it should stay?)

Thoughts on these changes?

We need to think through all the members of ResponseBuilder carefully and decide what component sets/reads them in what phase (and if that makes the most sense).

Should ResponseBuilder have methods instead of members? If so, that would allow a component to perhaps even replace the ResponseBuilder and delegate to the original.

How will a users custom component get configuration? Should components be a full plugins with an init() for configuration?

Yonik Seeley
added a comment - 24/Oct/07 13:59 Here's a new (smaller) patch that utilizes pluggable query parsers, and
removes DisMax specific modules since dismax specific logic is limited to query construction
DisMax request handler is now just the standard handler with defType=dismax added as a default
removed variable RequestBuilder class logic since it seems unnecessary... if two non-standard components want to communicate something, they can use the Context or the Response. (any reason I'm missing why it should stay?)
Thoughts on these changes?
We need to think through all the members of ResponseBuilder carefully and decide what component sets/reads them in what phase (and if that makes the most sense).
Should ResponseBuilder have methods instead of members? If so, that would allow a component to perhaps even replace the ResponseBuilder and delegate to the original.
How will a users custom component get configuration? Should components be a full plugins with an init() for configuration?

Yonik, i could not get your last patch to apply cleanly to trunk. This is my best attempt to resolve the conflicts and fix a few things.

1. remove the call to getResponseBuilder() in SearchComponent
2. the timing info was commented out of debugging

- - - -

The dismax changes look great - i like the new query parser stuff!

> - removed variable RequestBuilder class logic since it seems unnecessary...
> if two non-standard components want to communicate something, they can
> use the Context or the Response. (any reason I'm missing why it should stay?)

If at all possible, I think we should do our best to avoid depending on Map<String> for an interface.

>
> We need to think through all the members of ResponseBuilder carefully and decide what component sets/reads them in what phase (and if that makes the most sense).
>

yes

>
> Should ResponseBuilder have methods instead of members? If so, that would allow a component to perhaps even replace the ResponseBuilder and delegate to the original.
>

yes, that makes sense. How would a component replace the ResponseBuilder? If it could do that, there is obviously no need for the variable RequestBuilder class logic

>
> How will a users custom component get configuration? Should components be a full plugins with an init() for configuration?
>

As is, they are defined in a "components" section for SearchHandler (from the example solrconfig.xml):

Ryan McKinley
added a comment - 25/Oct/07 15:04 Yonik, i could not get your last patch to apply cleanly to trunk. This is my best attempt to resolve the conflicts and fix a few things.
1. remove the call to getResponseBuilder() in SearchComponent
2. the timing info was commented out of debugging
- - - -
The dismax changes look great - i like the new query parser stuff!
> - removed variable RequestBuilder class logic since it seems unnecessary...
> if two non-standard components want to communicate something, they can
> use the Context or the Response. (any reason I'm missing why it should stay?)
If at all possible, I think we should do our best to avoid depending on Map<String> for an interface.
>
> We need to think through all the members of ResponseBuilder carefully and decide what component sets/reads them in what phase (and if that makes the most sense).
>
yes
>
> Should ResponseBuilder have methods instead of members? If so, that would allow a component to perhaps even replace the ResponseBuilder and delegate to the original.
>
yes, that makes sense. How would a component replace the ResponseBuilder? If it could do that, there is obviously no need for the variable RequestBuilder class logic
>
> How will a users custom component get configuration? Should components be a full plugins with an init() for configuration?
>
As is, they are defined in a "components" section for SearchHandler (from the example solrconfig.xml):
<requestHandler name="/search" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
</lst>
<arr name="components">
<str>org.apache.solr.handler.component.QueryComponent</str>
<str>org.apache.solr.handler.component.FacetComponent</str>
<str>org.apache.solr.handler.component.MoreLikeThisComponent</str>
<str>org.apache.solr.handler.component.HighlightComponent</str>
<str>org.apache.solr.handler.component.DebugComponent</str>
</arr>
</requestHandler>
Perhaps the components should be passed the init params?
Unless there is a compelling reason, I don't think components need to be shared across request handlers thus justifying a top level component registry.

That's OK, the one you produced fails for me on a clean checkout too.... looks like maybe we hit a little corner case with patch/diff.
Should we perhaps commit this working version, marking ResponseBuilder as in-flux, and then continue generating patches from that???

Yonik Seeley
added a comment - 25/Oct/07 16:01 Yonik, i could not get your last patch to apply cleanly to trunk.
That's OK, the one you produced fails for me on a clean checkout too.... looks like maybe we hit a little corner case with patch/diff.
Should we perhaps commit this working version, marking ResponseBuilder as in-flux, and then continue generating patches from that???

Ryan McKinley
added a comment - 25/Oct/07 21:38 ok, that one works (or close enough)
>
> Should we perhaps commit this working version, marking ResponseBuilder as in-flux, and then continue generating patches from that???
>
If you feel comfortable with the big picture, yes, I think we should commit this and refine the ResponseBuilder and perhaps the configuration options in smaller patches.

Ryan McKinley
added a comment - 11/Nov/07 18:47 Here is an updated version of your patch that converts the ResponseBuilder public variables to private vars with get/set methods.
How do you feel about committing this and working out the ResponseBuilder details in subsequent smaller patches?

Yonik Seeley
added a comment - 12/Nov/07 14:31 - edited > How do you feel about committing this and working out the ResponseBuilder details in subsequent smaller patches?
+1
In this case I think the current patch represents the right direction, seems to work fine, and committing now will save effort of people working on other patches.

The SearchComponent framework needs to allow for custom component initialization – I previously believed this could be done from within the RequestHandler NamedList args. In trying to use this for a custom component that needs to load a resource file, I found it is woefully inadequate.

As Yonik suggested a while back (and I disagreed with), I now think components should be loaded and initialized the same way as RequestHandlers. That is, a top level name based registry in SolrCore that SearchHandlers can share. For example, solrconfig.xml could declare:

Also, there should be a way to configure components to run before or after the standard component list without having to know and maintain what the "standard" list is. For example, when we add field collapsing to the 'standard' options, it should not require everyone to update their solrconfig.xml

When SOLR-414 is committed, I will attach a patch using this strategy.

Ryan McKinley
added a comment - 22/Nov/07 01:33 The SearchComponent framework needs to allow for custom component initialization – I previously believed this could be done from within the RequestHandler NamedList args. In trying to use this for a custom component that needs to load a resource file, I found it is woefully inadequate.
As Yonik suggested a while back (and I disagreed with), I now think components should be loaded and initialized the same way as RequestHandlers. That is, a top level name based registry in SolrCore that SearchHandlers can share. For example, solrconfig.xml could declare:
<searchComponent name= "query" class= "org.apache.solr.handler.component.QueryComponent" />
<searchComponent name= "facet" class= "org.apache.solr.handler.component.FacetComponent" />
<searchComponent name= "mlt" class= "org.apache.solr.handler.component.MoreLikeThisComponent" />
<searchComponent name= "highlight" class= "org.apache.solr.handler.component.HighlightComponent" />
<searchComponent name= "debug" class= "org.apache.solr.handler.component.DebugComponent" />
and a SearchHandler can pick and choose what components are used with:
<requestHandler name= "/search" class= "solr.SearchHandler" >
<arr name= "components" >
<str> query </str>
<str> facet </str>
<str> mlt </str>
<str> highlight </str>
<str> debug </str>
</arr>
</requestHandler>
Also, there should be a way to configure components to run before or after the standard component list without having to know and maintain what the "standard" list is. For example, when we add field collapsing to the 'standard' options, it should not require everyone to update their solrconfig.xml
When SOLR-414 is committed, I will attach a patch using this strategy.

This makes SearchComponents a top level plugin (just like RequestHandlers) that get registered to a unique name in SolrCore. A SearchHandler can configure the component chain using:

<requestHandler name="/search" class="solr.SearchHandler"><lst name="defaults"><str name="echoParams">explicit</str></lst>
<!--
By default, this will register the following components:
<arr name="components"><str>query</str><str>facet</str><str>mlt</str><str>highlight</str><str>debug</str></arr>
To insert handlers before or after the 'standard' components, use:
<arr name="first-components"><str>first</str></arr><arr name="last-components"><str>last</str></arr>
-->
</requestHandler>

Yonik Seeley
added a comment - 27/Nov/07 16:54 The other idea I had was numeric positioning... query=1000, facet=2000, etc, and so a user could add their component at any point. Perhaps your first/last is sufficient though.

The problem first/last is trying to solve is to let a custom handler automatically incorporate default changes to the standard components without editing solrconfig.xml.

Numeric positioning seems a bit brittle unless the 'standard' components have a locked position. Without opening the door to legacy problems, I don't see any good way to insert a component between "facet" and "mlt" without specifying the whole chain.

Ryan McKinley
added a comment - 27/Nov/07 17:37 The problem first/last is trying to solve is to let a custom handler automatically incorporate default changes to the standard components without editing solrconfig.xml.
Numeric positioning seems a bit brittle unless the 'standard' components have a locked position. Without opening the door to legacy problems, I don't see any good way to insert a component between "facet" and "mlt" without specifying the whole chain.

Ryan McKinley
added a comment - 12/Dec/07 18:34 Sabyasachi - check that your patch does not comment out the loading line...
the committed version of this patch is not commented out:
http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/core/SolrCore.java

This is great; decomposing the handler and allowing the components to be wired up in the config really helps development (and maintenance of those changes).

For my purposes, I needed to make a change to the way the dismax query was being generated. Using the DisMaxQParserPlugin as a template, I created my own QParser and associated QParserPlugin; changed the relevant bits; added a <queryParser...> entry in solrconfig.xml; added the 'defType' parameter to the wanted SearchHandler configuration...and...all works well.

Just a few comments:

I had to make the QParser parse() method public (as the new query parser may still need to use the existing query parsers (backup lucene parser, boost parser, function parser, etc).

The QParserPlugin class seems unnecessary: all it does is implement init() and add a createParser method. Why not just have the parser constructor take those arguments...or, if that can't be done, create an interface to allow the parser itself implement both init() and createParser() (or create()). It then avoids having to create 2 classes (in the case of DisMax, in the same file...which is not pretty).

Michael Dodsworth
added a comment - 31/Jan/08 18:32 This is great; decomposing the handler and allowing the components to be wired up in the config really helps development (and maintenance of those changes).
For my purposes, I needed to make a change to the way the dismax query was being generated. Using the DisMaxQParserPlugin as a template, I created my own QParser and associated QParserPlugin; changed the relevant bits; added a <queryParser...> entry in solrconfig.xml; added the 'defType' parameter to the wanted SearchHandler configuration...and...all works well.
Just a few comments:
I had to make the QParser parse() method public (as the new query parser may still need to use the existing query parsers (backup lucene parser, boost parser, function parser, etc).
The QParserPlugin class seems unnecessary: all it does is implement init() and add a createParser method. Why not just have the parser constructor take those arguments...or, if that can't be done, create an interface to allow the parser itself implement both init() and createParser() (or create()). It then avoids having to create 2 classes (in the case of DisMax, in the same file...which is not pretty).

The QParserPlugin class seems unnecessary: all it does is implement init() and add a createParser method. Why not just have the parser constructor take those arguments...

That would require instantiation with reflection I think.

or, if that can't be done, create an interface to allow the parser itself implement both init() and createParser() (or create()). It then avoids having to create 2 classes (in the case of DisMax, in the same file...which is not pretty).

QParserPlugin is that interface essentially (except that its an class instead of an interface). For library maintainers an abstract class is preferred over an interface for things that a user will extend... that way signature changes can be made in a backward compatible manner.

Yonik Seeley
added a comment - 31/Jan/08 18:48 The QParserPlugin class seems unnecessary: all it does is implement init() and add a createParser method. Why not just have the parser constructor take those arguments...
That would require instantiation with reflection I think.
or, if that can't be done, create an interface to allow the parser itself implement both init() and createParser() (or create()). It then avoids having to create 2 classes (in the case of DisMax, in the same file...which is not pretty).
QParserPlugin is that interface essentially (except that its an class instead of an interface). For library maintainers an abstract class is preferred over an interface for things that a user will extend... that way signature changes can be made in a backward compatible manner.

Reflection is already being used to create the QParserPlugins (SolrCore:1027 and AbstractPluginLoader:83) - I'm guessing the reason for the plugin is just to avoid creating instances through reflection on every parse (as you could keep hold of the QParser class and call newInstance). The second point is moot, once you take away the need for createParser(...).

It's really not that big-a-deal, in the scheme of things.

QParserPlugin is that interface essentially (except that its an class instead of an interface). For library maintainers an abstract class is preferred over an interface for things that a user will extend... that way signature changes can be made in a backward compatible manner.

As an aside, method signature changes are usually trivial to fix; personally, the pain of those fixes is favourable to extending an abstract class unnecessarily.
Are there any architectural reworking projects on the roadmap? I'm sure backward compatibility is a massive concern; perhaps with the more modular plugin design route Solr is going down, those concerns can be addressed. If there's a chance of being accepted, I would love to contribute a move towards using Spring.

Michael Dodsworth
added a comment - 01/Feb/08 12:39
That would require instantiation with reflection I think.
Reflection is already being used to create the QParserPlugins (SolrCore:1027 and AbstractPluginLoader:83) - I'm guessing the reason for the plugin is just to avoid creating instances through reflection on every parse (as you could keep hold of the QParser class and call newInstance). The second point is moot, once you take away the need for createParser(...).
It's really not that big-a-deal, in the scheme of things.
QParserPlugin is that interface essentially (except that its an class instead of an interface). For library maintainers an abstract class is preferred over an interface for things that a user will extend... that way signature changes can be made in a backward compatible manner.
As an aside, method signature changes are usually trivial to fix; personally, the pain of those fixes is favourable to extending an abstract class unnecessarily.
Are there any architectural reworking projects on the roadmap? I'm sure backward compatibility is a massive concern; perhaps with the more modular plugin design route Solr is going down, those concerns can be addressed. If there's a chance of being accepted, I would love to contribute a move towards using Spring.

Yonik Seeley
added a comment - 01/Feb/08 18:11 Followed up on solr-dev to avoid stealing more of this JIRA isse:
http://www.nabble.com/Re%3A--jira--Commented%3A-Search-Components-%28plugins%29-to15227648.html#a15227648

Ryan McKinley
added a comment - 25/May/08 03:08 I'll mark this issue as "fixed" – I'm not sure why it was still open, the reason it was repoened has been resolved. Any new issues should get their own JIRA issue