Sub-Tasks

Activity

One issue here is the different purposes for lucene and solr function queries.
Solr's function queries have always evolved at a rapid pace (and are continuing to evolve) to support higher level features and interfaces in Solr. They are able to evolve rapidly because they are seen more as an implementation detail rather than interface classes, and I'd hate to lose that. So if we do try to make Solr's function queries more accessible to lucene users (again), it should be as a Solr module. As we can see from history and usage, function queries are critically important to Solr, but are obviously not to Lucene.

Yonik Seeley
added a comment - 25/Jan/11 16:30 One issue here is the different purposes for lucene and solr function queries.
Solr's function queries have always evolved at a rapid pace (and are continuing to evolve) to support higher level features and interfaces in Solr. They are able to evolve rapidly because they are seen more as an implementation detail rather than interface classes, and I'd hate to lose that. So if we do try to make Solr's function queries more accessible to lucene users (again), it should be as a Solr module. As we can see from history and usage, function queries are critically important to Solr, but are obviously not to Lucene.

One issue here is the different purposes for lucene and solr function queries.

Yonik, if that is your only issue then we are good to go. I don't think that moving stuff to modules changes anything how we develop software. Modularization, decoupling, interfaces etc. you know how to work with those ey? so hey what is really the point here, this modularization is a key point of merging development with lucene and everytime somebody proposes something like this you fear that that monolithic thing under /solr could become more modular and decoupled. I don't know why this is the case but we should and will move on with modularization. Folks will use it once its there, thats for sure. Same is true for faceting, replication, queryparsers, functionparser... those are on the list!

Simon Willnauer
added a comment - 25/Jan/11 17:23 One issue here is the different purposes for lucene and solr function queries.
Yonik, if that is your only issue then we are good to go. I don't think that moving stuff to modules changes anything how we develop software. Modularization, decoupling, interfaces etc. you know how to work with those ey? so hey what is really the point here, this modularization is a key point of merging development with lucene and everytime somebody proposes something like this you fear that that monolithic thing under /solr could become more modular and decoupled. I don't know why this is the case but we should and will move on with modularization. Folks will use it once its there, thats for sure. Same is true for faceting, replication, queryparsers, functionparser... those are on the list!

Michael McCandless
added a comment - 25/Jan/11 18:04 Can't we consolidate them under a new toplevel module? modules/queries?
We can mark the classes as lucene.experimental? Then we are free to iterate quickly. Does that address your concern Yonik?

Not sure if I communicated the issue clearly: taking what is essentially implementation and trying to make it interface clearly has a cost.
Function queries and the solr qparser architecture are constantly evolving, and wind all through solr.

If we attempt to make this easier to use by lucene users by moving it out to a module then:

it should be a solr module... keep the solr package names and make it clear that it's primary purpose is supporting higher level features in solr

we should make it such that java interface back compatibility is not a requirement, even for point releases

The other approach is to make a Lucene function query module (actually, we already have that), try to update it with stuff from solr, but make it's primary purpose to support the Java interfaces.

Yonik Seeley
added a comment - 25/Jan/11 18:08 Not sure if I communicated the issue clearly: taking what is essentially implementation and trying to make it interface clearly has a cost.
Function queries and the solr qparser architecture are constantly evolving, and wind all through solr.
If we attempt to make this easier to use by lucene users by moving it out to a module then:
it should be a solr module... keep the solr package names and make it clear that it's primary purpose is supporting higher level features in solr
we should make it such that java interface back compatibility is not a requirement, even for point releases
The other approach is to make a Lucene function query module (actually, we already have that), try to update it with stuff from solr, but make it's primary purpose to support the Java interfaces.

Can't we consolidate them under a new toplevel module? modules/queries?

This was the entire plan for this issue.

it should be a solr module... keep the solr package names and make it clear that it's primary purpose is supporting higher level features in solr

Are you serious? If you need to change something or make additions to such an interface / abstract class or whatever you should be able to do this. But I am totally against it if you say this is you internal playground and If I decide to change it entirely I will do so. I start getting the impression that this anti-modularization thing has nothing todo with code and development.. Lots of software has been evolving even with bwcompat etc. but hey this is not core - deprecation and changes should not be super hard to do and we always decided case by case. if it was easy / doable with bw compat we did it - if not we broke it so what it the big deal again? Seriously, I don't want to have this discussion everything somebody comes up with a /modules suggestion - the is the way to go, we agreed and voted on it and it passed.

Simon Willnauer
added a comment - 26/Jan/11 07:15 Can't we consolidate them under a new toplevel module? modules/queries?
This was the entire plan for this issue.
it should be a solr module... keep the solr package names and make it clear that it's primary purpose is supporting higher level features in solr
Are you serious? If you need to change something or make additions to such an interface / abstract class or whatever you should be able to do this. But I am totally against it if you say this is you internal playground and If I decide to change it entirely I will do so. I start getting the impression that this anti-modularization thing has nothing todo with code and development.. Lots of software has been evolving even with bwcompat etc. but hey this is not core - deprecation and changes should not be super hard to do and we always decided case by case. if it was easy / doable with bw compat we did it - if not we broke it so what it the big deal again? Seriously, I don't want to have this discussion everything somebody comes up with a /modules suggestion - the is the way to go, we agreed and voted on it and it passed.

Michael Busch
added a comment - 26/Jan/11 17:16 Seriously, I don't want to have this discussion everything somebody comes up with a /modules suggestion - the is the way to go, we agreed and voted on it and it passed.
+1. I agree with Simon and Robert.

I'm trying to follow this thread. If I extract the somewhat off topic bitterness I seem to see:

1. There is interest in trying to unify function queries again.

2. There is a concern - currently the Solr function queries are part of Solr - java code in Solr is changed on a whim. In Lucene, the bar is higher - java code is generally considered an impl detail in Solr, unless it's a plugin api. This has been an attractive quality in the past. Moving it to Lucene changes it to more of a supported API. Valid point if you ask me.

3. A response was made about marking it experimental.

4. A response was made about perhaps internal.

The rest does not add much. We have not voted on this issue - we voted on broad topics - to be sorted out as we go. Lets discuss it for a bit before everyone just gets pissed off.

Mark Miller
added a comment - 26/Jan/11 17:57 I'm trying to follow this thread. If I extract the somewhat off topic bitterness I seem to see:
1. There is interest in trying to unify function queries again.
2. There is a concern - currently the Solr function queries are part of Solr - java code in Solr is changed on a whim. In Lucene, the bar is higher - java code is generally considered an impl detail in Solr, unless it's a plugin api. This has been an attractive quality in the past. Moving it to Lucene changes it to more of a supported API. Valid point if you ask me.
3. A response was made about marking it experimental.
4. A response was made about perhaps internal.
The rest does not add much. We have not voted on this issue - we voted on broad topics - to be sorted out as we go. Lets discuss it for a bit before everyone just gets pissed off.

One concern some people have had is that lucene not become too tailored just for solr - a very valid point, and I've tried hard not to add anything to lucene that's solr specific or that only solr would benefit from. Nevertheless, there have been a number of occasions (normally on IRC) where there has been disagreement/debate and I've had statements of the form "solr's not the only lucene user you know" type of arguments thrown up.

I'm not against modularizing some solr functionality so lucene users can use it, but hopefully you understand why I think it should retain the solr package namespace and be a solr module (or at a minimum perhaps a lucene/solr module). It just wouldn't be fair to downgrade solr support for such a large core piece of solr, and have people cry foul when something is added that only solr can benefit from (as is done fairly regularly today since it's just part of solr).

Yonik Seeley
added a comment - 26/Jan/11 18:33 One concern some people have had is that lucene not become too tailored just for solr - a very valid point, and I've tried hard not to add anything to lucene that's solr specific or that only solr would benefit from. Nevertheless, there have been a number of occasions (normally on IRC) where there has been disagreement/debate and I've had statements of the form "solr's not the only lucene user you know" type of arguments thrown up.
I'm not against modularizing some solr functionality so lucene users can use it, but hopefully you understand why I think it should retain the solr package namespace and be a solr module (or at a minimum perhaps a lucene/solr module). It just wouldn't be fair to downgrade solr support for such a large core piece of solr, and have people cry foul when something is added that only solr can benefit from (as is done fairly regularly today since it's just part of solr).

I'm not against modularizing some solr functionality so lucene users can use it, but hopefully you understand why I think it should retain the solr package namespace and be a solr module (or at a minimum perhaps a lucene/solr module). It just wouldn't be fair to downgrade solr support for such a large core piece of solr, and have people cry foul when something is added that only solr can benefit from (as is done fairly regularly today since it's just part of solr).

I don't give a sh.. about the package name - /modules is shared space - some is more luceneish some is more solrish. I also think that bwcomap is not a super big deal with modules either. Why don't we get a proposal patch wise and see how that goes... Still - the concerns are super minor IMO and should not keep us from making progress on the modularization side of things.

Simon Willnauer
added a comment - 26/Jan/11 18:52
I'm not against modularizing some solr functionality so lucene users can use it, but hopefully you understand why I think it should retain the solr package namespace and be a solr module (or at a minimum perhaps a lucene/solr module). It just wouldn't be fair to downgrade solr support for such a large core piece of solr, and have people cry foul when something is added that only solr can benefit from (as is done fairly regularly today since it's just part of solr).
I don't give a sh.. about the package name - /modules is shared space - some is more luceneish some is more solrish. I also think that bwcomap is not a super big deal with modules either. Why don't we get a proposal patch wise and see how that goes... Still - the concerns are super minor IMO and should not keep us from making progress on the modularization side of things.

Yonik Seeley
added a comment - 26/Jan/11 19:47 - edited Let me try to recap my concerns here (2 main ones):
1) changing implementation to interface
a different back compat policy for the module (i.e. the java interfaces only) could partially address this.
marking as lucene.internal could mostly address, but that's prob not what direct module users would prefer.
2) downgrading of solr support for this core piece of solr
making it a solr module (retaining the solr package naming, etc) should mostly handle this issue.

I'm just trying to understand what you mean by 2). How is Solr's support of function queries downgraded if they dont have the Solr package name?

I can appreciate the need to be able to make changes easily but with function queries becoming so widely used in Solr, Solr users and probably Lucene users, I think having a stable interface is a good thing.

Chris Male
added a comment - 26/Jan/11 20:15 Hey Yonik,
I'm just trying to understand what you mean by 2). How is Solr's support of function queries downgraded if they dont have the Solr package name?
I can appreciate the need to be able to make changes easily but with function queries becoming so widely used in Solr, Solr users and probably Lucene users, I think having a stable interface is a good thing.

Really back in the days where stuff like that was moved to solr there was a good reason for duplicating the func - lucene was on a totally different release schedule, bw. compat etc. But now since we are aligned with releases and we have the modules area there is really no good reason for stuff like that. If solr needs some feature in FuncQuery & friends we are going to add it. If we can fix it without breaking BWCompat we should do it - if not we discuss case by case! I have actually written some FuncQuery related code for customers and they would not be happy if we break bwcompat codewise (I exclude package naming here) - just saying that keeping it internal so that we can hack it as we like is not an option anyway!!

Simon Willnauer
added a comment - 26/Jan/11 22:28 Really back in the days where stuff like that was moved to solr there was a good reason for duplicating the func - lucene was on a totally different release schedule, bw. compat etc. But now since we are aligned with releases and we have the modules area there is really no good reason for stuff like that. If solr needs some feature in FuncQuery & friends we are going to add it. If we can fix it without breaking BWCompat we should do it - if not we discuss case by case! I have actually written some FuncQuery related code for customers and they would not be happy if we break bwcompat codewise (I exclude package naming here) - just saying that keeping it internal so that we can hack it as we like is not an option anyway!!
just my $0.05
simon

I think it makes sense to make function queries a module and expose them more to Lucene users for a number of reasons.

First off, though, I get the concern about implementation detail versus interface, but I think the tradeoff is worth it for several reasons.

Function queries (FQs) already have a pretty stable interface and we have already worked hard to maintain BWC for them since they are one of the more common areas that people actually do extend in Solr. In other words, FQs are already more interface than implementation.

I've always been stumped as to why more Lucene users don't use them and instead re-invent the wheel. The fact that it has taken this long frankly mystifies me

As a corollary of that, more light on them will make them better, improve performance, etc.

I agree with most here that the ability to do this kind of thing is exactly why we merged.

I don't think that is the concern. I think the concern is that if one only has to worry about implementation details and not about backwards compat, public API, etc. then it is a lot easier to innovate. It's the equivalent of refactoring IndexWriter public methods versus the underlying DocumentsWriter. Remember, in Solr, 99% of the public API is in the Request parameters, not in Java interface. As long as we maintain BWC for the Request interface (i.e. &start=0&rows=10&q=foo), we are more or less free to do what we want in the implementation layer.

In the end, I'm +1 on moving it, but with the distinction that changes in modules need to account for Solr as well as Lucene.

Grant Ingersoll
added a comment - 27/Jan/11 13:23 I think it makes sense to make function queries a module and expose them more to Lucene users for a number of reasons.
First off, though, I get the concern about implementation detail versus interface, but I think the tradeoff is worth it for several reasons.
Function queries (FQs) already have a pretty stable interface and we have already worked hard to maintain BWC for them since they are one of the more common areas that people actually do extend in Solr. In other words, FQs are already more interface than implementation.
I've always been stumped as to why more Lucene users don't use them and instead re-invent the wheel. The fact that it has taken this long frankly mystifies me
As a corollary of that, more light on them will make them better, improve performance, etc.
I agree with most here that the ability to do this kind of thing is exactly why we merged.
Hmm I don't really understand that part either. You're concerned that you'll lose the freedom to add functionality that'd benefit Solr only?
I don't think that is the concern. I think the concern is that if one only has to worry about implementation details and not about backwards compat, public API, etc. then it is a lot easier to innovate. It's the equivalent of refactoring IndexWriter public methods versus the underlying DocumentsWriter. Remember, in Solr, 99% of the public API is in the Request parameters, not in Java interface. As long as we maintain BWC for the Request interface (i.e. &start=0&rows=10&q=foo), we are more or less free to do what we want in the implementation layer.
In the end, I'm +1 on moving it, but with the distinction that changes in modules need to account for Solr as well as Lucene.

Function queries (FQs) already have a pretty stable interface and we have already worked hard to maintain BWC for them since they are one of the more common areas that people actually do extend in Solr. In other words, FQs are already more interface than implementation.

YES! +1

As a corollary of that, more light on them will make them better, improve performance, etc.

again +1

I think we can maintain the interface and innovations side by side - that said, lets look at a patch and discuss further once we see code / moving paths etc.

Simon Willnauer
added a comment - 27/Jan/11 13:32 Function queries (FQs) already have a pretty stable interface and we have already worked hard to maintain BWC for them since they are one of the more common areas that people actually do extend in Solr. In other words, FQs are already more interface than implementation.
YES! +1
As a corollary of that, more light on them will make them better, improve performance, etc.
again +1
I think we can maintain the interface and innovations side by side - that said, lets look at a patch and discuss further once we see code / moving paths etc.
simon

but with the distinction that changes in modules need to account for Solr as well as Lucene.

I thought this was understood about modules anyway, that we move the code there, all committers
have access and its shared code to support both projects.

hopefully we didnt screw this up in anyway for modules/analysis, I don't think Solr lost anything here.

Separately, for the record, we didnt merge ALL of Solr's analysis stuff into modules/analysis.
Some of this stuff really is solr-specific (e.g. ReverseWildCard support that needs some schema + qp integration).
This really solr-specific stuff that isn't generally useful stayed in Solr.

So I think we should always consider this for future modules, it doesn't have to be a binary decision,
but instead what is general and makes sense.

Robert Muir
added a comment - 27/Jan/11 13:39 but with the distinction that changes in modules need to account for Solr as well as Lucene.
I thought this was understood about modules anyway, that we move the code there, all committers
have access and its shared code to support both projects.
hopefully we didnt screw this up in anyway for modules/analysis, I don't think Solr lost anything here.
Separately, for the record, we didnt merge ALL of Solr's analysis stuff into modules/analysis.
Some of this stuff really is solr-specific (e.g. ReverseWildCard support that needs some schema + qp integration).
This really solr-specific stuff that isn't generally useful stayed in Solr.
So I think we should always consider this for future modules, it doesn't have to be a binary decision,
but instead what is general and makes sense.

Grant Ingersoll
added a comment - 27/Jan/11 14:25 I thought this was understood about modules anyway, that we move the code there, all committers
have access and its shared code to support both projects.
Cool. I was just re-iterating it. As such, I don't think it makes sense to mark it as lucene.internal, but experimental does make sense.

hopefully we didnt screw this up in anyway for modules/analysis, I don't think Solr lost anything here.

Right, modules/analysis never concerned me since the filters moved there were nice and self contained, they only already relied on the lucene analysis interfaces, would easy to make a solr-specific copy of one if really needed in the future, and left some of the solr specific stuff where it was. It all made sense and was a win-win.

We've hung a lot of features on function queries in Solr, and I've had plans for more to come - which is why it's not acceptable for it to suddenly not be a core part of Solr.

its shared code to support both projects.

The "status" and expectations for the code are one of the big issues here - it's not just about the technical issues of what a patch looks like. I would expect, for example, to be able to add a feature that probably only made sense for solr (as long as it didn't prevent lucene users from using function queries in general of course). Is that acceptable to everyone? I would not necessarily have the same expectation for other things in modules - it really depends on what it is. Does that make sense?

Yonik Seeley
added a comment - 27/Jan/11 15:07 hopefully we didnt screw this up in anyway for modules/analysis, I don't think Solr lost anything here.
Right, modules/analysis never concerned me since the filters moved there were nice and self contained, they only already relied on the lucene analysis interfaces, would easy to make a solr-specific copy of one if really needed in the future, and left some of the solr specific stuff where it was. It all made sense and was a win-win.
We've hung a lot of features on function queries in Solr, and I've had plans for more to come - which is why it's not acceptable for it to suddenly not be a core part of Solr.
its shared code to support both projects.
The "status" and expectations for the code are one of the big issues here - it's not just about the technical issues of what a patch looks like. I would expect, for example, to be able to add a feature that probably only made sense for solr (as long as it didn't prevent lucene users from using function queries in general of course). Is that acceptable to everyone? I would not necessarily have the same expectation for other things in modules - it really depends on what it is. Does that make sense?

I think it does. I think, as usual, it can be handled case by case. Just as Robert pointed out, some analysis things stayed in Solr, I don't think anyone is going to argue that we should put things where they most make sense. In general, though, I think the modules affords us a lot of opportunity through more exposure at the loss of a little bit of flexibility in Solr, which, as you said can still be handled by doing that little piece in just Solr.

Grant Ingersoll
added a comment - 27/Jan/11 15:27 Does that make sense?
I think it does. I think, as usual, it can be handled case by case. Just as Robert pointed out, some analysis things stayed in Solr, I don't think anyone is going to argue that we should put things where they most make sense. In general, though, I think the modules affords us a lot of opportunity through more exposure at the loss of a little bit of flexibility in Solr, which, as you said can still be handled by doing that little piece in just Solr.

The "status" and expectations for the code are one of the big issues here - it's not just about the technical issues of what a patch looks like. I would expect, for example, to be able to add a feature that probably only made sense for solr (as long as it didn't prevent lucene users from using function queries in general of course). Is that acceptable to everyone? I would not necessarily have the same expectation for other things in modules - it really depends on what it is. Does that make sense?

Do you have an example of what type of feature you are talking about? I think it would really clarify things.

Robert Muir
added a comment - 27/Jan/11 15:32
The "status" and expectations for the code are one of the big issues here - it's not just about the technical issues of what a patch looks like. I would expect, for example, to be able to add a feature that probably only made sense for solr (as long as it didn't prevent lucene users from using function queries in general of course). Is that acceptable to everyone? I would not necessarily have the same expectation for other things in modules - it really depends on what it is. Does that make sense?
Do you have an example of what type of feature you are talking about? I think it would really clarify things.

Chris Male
added a comment - 27/Jan/11 22:48 - edited I would expect, for example, to be able to add a feature that probably only made sense for solr (as long as it didn't prevent lucene users from using function queries in general of course).
Does any of these features you allude to rely on stuff that thats in Solr core? This impacts the dependencies this module will have.

Patch that factors out the core FunctionQuery stuff into a queries module. Theres alot of issues here but it does compile.

The following issues need to be addressed:

MutableValue & MutableFloatValue are used in the FunctionQuery code so I've pulled them into the module too. Should all the other Mutable*Value classes come too? Should they go into some other module?

What to return in ValueSource#getSortField which currently returns a SortField which implements SolrSortField. This is currently commented out so we can determine what best to do. Having this commented out breaks the Solr tests.

Many of the ValueSources and DocValues in Solr could be moved to the module, but not all of them. Some have dependencies on Solr dependencies / Solr core code.

Module isn't full integrated into the build.xmls and dev-tools.

Lucene core's FunctionQuery stuff needs to be removed.

I'll add a script that needs to be run before adding this patch shortly.

Chris Male
added a comment - 18/May/11 13:06 Patch that factors out the core FunctionQuery stuff into a queries module. Theres alot of issues here but it does compile.
The following issues need to be addressed:
MutableValue & MutableFloatValue are used in the FunctionQuery code so I've pulled them into the module too. Should all the other Mutable*Value classes come too? Should they go into some other module?
What to return in ValueSource#getSortField which currently returns a SortField which implements SolrSortField. This is currently commented out so we can determine what best to do. Having this commented out breaks the Solr tests.
Many of the ValueSources and DocValues in Solr could be moved to the module, but not all of them. Some have dependencies on Solr dependencies / Solr core code.
Module isn't full integrated into the build.xmls and dev-tools.
Lucene core's FunctionQuery stuff needs to be removed.
I'll add a script that needs to be run before adding this patch shortly.

I think the patch is a great start, ie, we will need the low level
infra used by FQs in the module.

MutableValue & MutableFloatValue are used in the FunctionQuery code so I've pulled them into the module too. Should all the other Mutable*Value classes come too? Should they go into some other module?

I think we should move Mutable* over? Grouping module will need all
of these, I think? (Ie if we want to allow users to group by
arbitrary typed field).

What to return in ValueSource#getSortField which currently returns a SortField which implements SolrSortField. This is currently commented out so we can determine what best to do. Having this commented out breaks the Solr tests.

Hmm good question. This looks to be related to sorting by FQ
(SOLR-1297) because some FQs need to be weighted. Not sure what to do
here yet... which FQs in particular require this?

Many of the ValueSources and DocValues in Solr could be moved to the module, but not all of them. Some have dependencies on Solr dependencies / Solr core code.

I think apply 90/10 rule here? Start with the easy-to-move queries?
We don't need initial go to be perfect... progress not perfection.

Lucene core's FunctionQuery stuff needs to be removed.

Do you have a sense of whether Solr's FQs are a superset of Lucene's?
Ie, is there anything Lucene's FQs can do that Solr's can't?

Probably, as a separate issue, we should also move contrib/queries ->
modules/queries. And I think the cool nested queries (LUCENE-2454)
would also go into this module...

Michael McCandless
added a comment - 18/May/11 18:37 Thanks Chris! The patch applies cleanly for me (after running the svn
commands) and everything compiles.
I think the patch is a great start, ie, we will need the low level
infra used by FQs in the module.
MutableValue & MutableFloatValue are used in the FunctionQuery code so I've pulled them into the module too. Should all the other Mutable*Value classes come too? Should they go into some other module?
I think we should move Mutable* over? Grouping module will need all
of these, I think? (Ie if we want to allow users to group by
arbitrary typed field).
What to return in ValueSource#getSortField which currently returns a SortField which implements SolrSortField. This is currently commented out so we can determine what best to do. Having this commented out breaks the Solr tests.
Hmm good question. This looks to be related to sorting by FQ
( SOLR-1297 ) because some FQs need to be weighted. Not sure what to do
here yet... which FQs in particular require this?
Many of the ValueSources and DocValues in Solr could be moved to the module, but not all of them. Some have dependencies on Solr dependencies / Solr core code.
I think apply 90/10 rule here? Start with the easy-to-move queries?
We don't need initial go to be perfect... progress not perfection.
Lucene core's FunctionQuery stuff needs to be removed.
Do you have a sense of whether Solr's FQs are a superset of Lucene's?
Ie, is there anything Lucene's FQs can do that Solr's can't?
Probably, as a separate issue, we should also move contrib/queries ->
modules/queries. And I think the cool nested queries ( LUCENE-2454 )
would also go into this module...

I think we should move Mutable* over? Grouping module will need all of these, I think? (Ie if we want to allow users to group by arbitrary typed field).

Yup okay.

Hmm good question. This looks to be related to sorting by FQ (SOLR-1297) because some FQs need to be weighted. Not sure what to do here yet... which FQs in particular require this?

Both all of them and not many of them (complicated). The sorting of FQ
functionality is necessary for all FQs in Solr since the user can sort
by any FQ. However the extension made by the SolrSortField is the ability
to create a 'weighted' SortField by passing in a IndexSearcher and having
it stored in a Map. The Map is then made available to any DocValues when
they create their values.

This is when the 'not many' comes into effect. Only a few DocValues implementations
use the contents of the Map. DocFreqValueSource for example uses the IndexSearcher
in the Map. But I suppose there could be many user implementations that do.

SolrSortField is currently used in SolrIndexSearcher to 'weight' the Sorts. I wonder
whether we can simplify this? When ValueSource#getSort is called (which is only in 1
place really), we can pass in the IndexSearcher, meaning the SortField can be 'weighted'
then and there.

Since SolrSortField is only used in this 1 place currently, we can then revisit dropping it?

Do you think its worth opening an issue to address this first?

I think apply 90/10 rule here? Start with the easy-to-move queries? We don't need initial go to be perfect... progress not perfection.

Could we sort the initial commit out and then I can move them over in batches?
Already have a 108k patch, I'd say moving what we can will push it towards 300k

Do you have a sense of whether Solr's FQs are a superset of Lucene's? Ie, is there anything Lucene's FQs can do that Solr's can't?

Solr FQs are hugely more advanced than the ValueSourceQuery based stuff in Lucene.
Its not a full 1 to 1 change since the APIs are slightly different, but I'd say
that we'd want users to use the FQ line of classes. I cant see anything in Lucene's
VSQs that you couldn't do using FQs.

Chris Male
added a comment - 19/May/11 04:33 - edited I think we should move Mutable* over? Grouping module will need all of these, I think? (Ie if we want to allow users to group by arbitrary typed field).
Yup okay.
Hmm good question. This looks to be related to sorting by FQ ( SOLR-1297 ) because some FQs need to be weighted. Not sure what to do here yet... which FQs in particular require this?
Both all of them and not many of them (complicated). The sorting of FQ
functionality is necessary for all FQs in Solr since the user can sort
by any FQ. However the extension made by the SolrSortField is the ability
to create a 'weighted' SortField by passing in a IndexSearcher and having
it stored in a Map. The Map is then made available to any DocValues when
they create their values.
This is when the 'not many' comes into effect. Only a few DocValues implementations
use the contents of the Map. DocFreqValueSource for example uses the IndexSearcher
in the Map. But I suppose there could be many user implementations that do.
SolrSortField is currently used in SolrIndexSearcher to 'weight' the Sorts. I wonder
whether we can simplify this? When ValueSource#getSort is called (which is only in 1
place really), we can pass in the IndexSearcher, meaning the SortField can be 'weighted'
then and there.
Since SolrSortField is only used in this 1 place currently, we can then revisit dropping it?
Do you think its worth opening an issue to address this first?
I think apply 90/10 rule here? Start with the easy-to-move queries? We don't need initial go to be perfect... progress not perfection.
Could we sort the initial commit out and then I can move them over in batches?
Already have a 108k patch, I'd say moving what we can will push it towards 300k
Do you have a sense of whether Solr's FQs are a superset of Lucene's? Ie, is there anything Lucene's FQs can do that Solr's can't?
Solr FQs are hugely more advanced than the ValueSourceQuery based stuff in Lucene.
Its not a full 1 to 1 change since the APIs are slightly different, but I'd say
that we'd want users to use the FQ line of classes. I cant see anything in Lucene's
VSQs that you couldn't do using FQs.

Hmm good question. This looks to be related to sorting by FQ (SOLR-1297) because some FQs need to be weighted. Not sure what to do here yet... which FQs in particular require this?

Both all of them and not many of them (complicated). The sorting of FQ
functionality is necessary for all FQs in Solr since the user can sort
by any FQ. However the extension made by the SolrSortField is the ability
to create a 'weighted' SortField by passing in a IndexSearcher and having
it stored in a Map. The Map is then made available to any DocValues when
they create their values.

So the goal here is to make the top-level searcher (IR) visible to the
FQ's getValues? I think this pre-dated the cutover to
AtomicReaderContext, which now provides the top reader? Maybe this
isn't needed anymore...?

This is when the 'not many' comes into effect. Only a few DocValues implementations
use the contents of the Map. DocFreqValueSource for example uses the IndexSearcher
in the Map. But I suppose there could be many user implementations that do.

EG DocFreqValueSource could pull docFreq from the top reader?

Though QueryValueSource needs a searcher (but, seems to make one, from
the top reader, if it wasn't provided one).

SolrSortField is currently used in SolrIndexSearcher to 'weight' the Sorts. I wonder
whether we can simplify this? When ValueSource#getSort is called (which is only in 1
place really), we can pass in the IndexSearcher, meaning the SortField can be 'weighted'
then and there.

Since SolrSortField is only used in this 1 place currently, we can then revisit dropping it?

That seems good too?

Do you think its worth opening an issue to address this first?

Yes can you do that, and mark this issue as depending in it?

I think apply 90/10 rule here? Start with the easy-to-move queries? We don't need initial go to be perfect... progress not perfection.

Could we sort the initial commit out and then I can move them over in batches?
Already have a 108k patch, I'd say moving what we can will push it towards 300k

Good question... maybe we can do this on a branch?

Do you have a sense of whether Solr's FQs are a superset of Lucene's? Ie, is there anything Lucene's FQs can do that Solr's can't?

Solr FQs are hugely more advanced than the ValueSourceQuery based stuff in Lucene.
Its not a full 1 to 1 change since the APIs are slightly different, but I'd say
that we'd want users to use the FQ line of classes. I cant see anything in Lucene's
VSQs that you couldn't do using FQs.

OK then once we finally have the "superset" moved into the module we
should remove Lucene's (deprecate on 3.x).

Michael McCandless
added a comment - 19/May/11 22:22
Hmm good question. This looks to be related to sorting by FQ ( SOLR-1297 ) because some FQs need to be weighted. Not sure what to do here yet... which FQs in particular require this?
Both all of them and not many of them (complicated). The sorting of FQ
functionality is necessary for all FQs in Solr since the user can sort
by any FQ. However the extension made by the SolrSortField is the ability
to create a 'weighted' SortField by passing in a IndexSearcher and having
it stored in a Map. The Map is then made available to any DocValues when
they create their values.
So the goal here is to make the top-level searcher (IR) visible to the
FQ's getValues? I think this pre-dated the cutover to
AtomicReaderContext, which now provides the top reader? Maybe this
isn't needed anymore...?
This is when the 'not many' comes into effect. Only a few DocValues implementations
use the contents of the Map. DocFreqValueSource for example uses the IndexSearcher
in the Map. But I suppose there could be many user implementations that do.
EG DocFreqValueSource could pull docFreq from the top reader?
Though QueryValueSource needs a searcher (but, seems to make one, from
the top reader, if it wasn't provided one).
SolrSortField is currently used in SolrIndexSearcher to 'weight' the Sorts. I wonder
whether we can simplify this? When ValueSource#getSort is called (which is only in 1
place really), we can pass in the IndexSearcher, meaning the SortField can be 'weighted'
then and there.
Since SolrSortField is only used in this 1 place currently, we can then revisit dropping it?
That seems good too?
Do you think its worth opening an issue to address this first?
Yes can you do that, and mark this issue as depending in it?
I think apply 90/10 rule here? Start with the easy-to-move queries? We don't need initial go to be perfect... progress not perfection.
Could we sort the initial commit out and then I can move them over in batches?
Already have a 108k patch, I'd say moving what we can will push it towards 300k
Good question... maybe we can do this on a branch?
Do you have a sense of whether Solr's FQs are a superset of Lucene's? Ie, is there anything Lucene's FQs can do that Solr's can't?
Solr FQs are hugely more advanced than the ValueSourceQuery based stuff in Lucene.
Its not a full 1 to 1 change since the APIs are slightly different, but I'd say
that we'd want users to use the FQ line of classes. I cant see anything in Lucene's
VSQs that you couldn't do using FQs.
OK then once we finally have the "superset" moved into the module we
should remove Lucene's (deprecate on 3.x).

Yonik Seeley
added a comment - 19/May/11 22:38 Regarding weighting - function queries can contain normal queries, so anywhere a function query is used, it must be weighted first.
When ValueSource#getSort is called (which is only in 1 place really), we can pass in the IndexSearcher, meaning the SortField can be 'weighted' then and there.
Sort instances are like Query instances, and for many reasons should not be bound to any particular searcher.

So the goal here is to make the top-level searcher (IR) visible to the
FQ's getValues? I think this pre-dated the cutover to
AtomicReaderContext, which now provides the top reader? Maybe this
isn't needed anymore...?

Its very intentional (as Yonik has pointed out). It allows any Querys in the ValueSources to be weighted.

Though QueryValueSource needs a searcher (but, seems to make one, from
the top reader, if it wasn't provided one).

Is this desirable? IndexSearcher is pretty thin I know but is it fast enough to create that it has a nominal effect? If its faster than passing down the IndexSearcher then maybe its a good idea for anybody wanting an IndexSearcher to do this.

Chris Male
added a comment - 20/May/11 03:22
So the goal here is to make the top-level searcher (IR) visible to the
FQ's getValues? I think this pre-dated the cutover to
AtomicReaderContext, which now provides the top reader? Maybe this
isn't needed anymore...?
Its very intentional (as Yonik has pointed out). It allows any Querys in the ValueSources to be weighted.
Though QueryValueSource needs a searcher (but, seems to make one, from
the top reader, if it wasn't provided one).
Is this desirable? IndexSearcher is pretty thin I know but is it fast enough to create that it has a nominal effect? If its faster than passing down the IndexSearcher then maybe its a good idea for anybody wanting an IndexSearcher to do this.
Good question... maybe we can do this on a branch?
Absolutely, can you create one?

Regarding weighting - function queries can contain normal queries, so anywhere a function query is used, it must be weighted first.

Yup I've come to understand that. So the challenge is how to do this when a FunctionQuery is used to sort and not Query? Okay. I'm going to open an issue to see if we can address this better, maybe by extending SortField or something.

Sort instances are like Query instances, and for many reasons should not be bound to any particular searcher.

Yeah that is true. But ValueSource#getSort actually returns a SortField. Does the same apply to SortField instances? Also, ValueSource#weight(IndexSearcher) returns a new SortField as well, with a context containing the IndexSearcher. Consequently the new SortField is bound to that particular searcher.

Chris Male
added a comment - 20/May/11 03:34 Hey Yonik,
Its super to hear from you on this, it'll be a real help.
Regarding weighting - function queries can contain normal queries, so anywhere a function query is used, it must be weighted first.
Yup I've come to understand that. So the challenge is how to do this when a FunctionQuery is used to sort and not Query? Okay. I'm going to open an issue to see if we can address this better, maybe by extending SortField or something.
Sort instances are like Query instances, and for many reasons should not be bound to any particular searcher.
Yeah that is true. But ValueSource#getSort actually returns a SortField. Does the same apply to SortField instances? Also, ValueSource#weight(IndexSearcher) returns a new SortField as well, with a context containing the IndexSearcher. Consequently the new SortField is bound to that particular searcher.

Is this desirable? IndexSearcher is pretty thin I know but is it fast enough to create that it has a nominal effect? If its faster than passing down the IndexSearcher then maybe its a good idea for anybody wanting an IndexSearcher to do this.

Actually one flaw in this is that SolrIndexSearcher has overridden some functionality in IndexSearcher, such as SimilarityProvider, which is used NormValueSource.

Chris Male
added a comment - 20/May/11 04:09
Is this desirable? IndexSearcher is pretty thin I know but is it fast enough to create that it has a nominal effect? If its faster than passing down the IndexSearcher then maybe its a good idea for anybody wanting an IndexSearcher to do this.
Actually one flaw in this is that SolrIndexSearcher has overridden some functionality in IndexSearcher, such as SimilarityProvider, which is used NormValueSource.

> Sort instances are like Query instances, and for many reasons should not be bound to any particular searcher.

Yeah that is true. But ValueSource#getSort actually returns a SortField. Does the same apply to SortField instances?

Yeah, that's part of a Sort.

Also, ValueSource#weight(IndexSearcher) returns a new SortField as well, with a context containing the IndexSearcher. Consequently the new SortField is bound to that particular searcher.

ValueSourceSortField.weight(IndexSearcher) - right. That's sort of how queries can be re-written too... and the returned temporary SortField is never retained. I guess I should have said that it should always be possible for Query and Sort instances to be independent of searchers and readers... but temporary ones can be created for the purposes of actually executing the query.

Yonik Seeley
added a comment - 20/May/11 16:42 > Sort instances are like Query instances, and for many reasons should not be bound to any particular searcher.
Yeah that is true. But ValueSource#getSort actually returns a SortField. Does the same apply to SortField instances?
Yeah, that's part of a Sort.
Also, ValueSource#weight(IndexSearcher) returns a new SortField as well, with a context containing the IndexSearcher. Consequently the new SortField is bound to that particular searcher.
ValueSourceSortField.weight(IndexSearcher) - right. That's sort of how queries can be re-written too... and the returned temporary SortField is never retained. I guess I should have said that it should always be possible for Query and Sort instances to be independent of searchers and readers... but temporary ones can be created for the purposes of actually executing the query.
Note that Solr's query cache uses Sort as part of the cache key too.