Activity

I would only add that we probably don't want to add this to the actual response document returned, but rather attach them to an additional section like highlighting works. Same really for score, actually too, but I digress.

Erik Hatcher
added a comment - 21/Jul/09 14:52 I would only add that we probably don't want to add this to the actual response document returned, but rather attach them to an additional section like highlighting works. Same really for score, actually too, but I digress.

Not sure about not adding it - what fields are returned is selectable, right? and it's not possible to obtain this information otherwise. Some time ago I implemented this for a client - it was before SOLR-243, but I used the same idea, i.e. to use a subclass of IndexReader that returns documents with added function fields (and score).

Andrzej Bialecki
added a comment - 21/Jul/09 15:02 Not sure about not adding it - what fields are returned is selectable, right? and it's not possible to obtain this information otherwise. Some time ago I implemented this for a client - it was before SOLR-243 , but I used the same idea, i.e. to use a subclass of IndexReader that returns documents with added function fields (and score).

In my patch in SOLR-773, I tackled this issue by creating the idea of a FieldValueSource, which mapped a name of a pseudo-field to an arbitrary source of data which could be computed at runtime. For me it was distances, but it could also be the results of a FunctionQuery. Since there was a mapping of name to data, it was possible to include or exclude the FieldValueSources from adding their information to the search results through the fl parameter.

Chris Male
added a comment - 21/Jul/09 15:16 In my patch in SOLR-773 , I tackled this issue by creating the idea of a FieldValueSource, which mapped a name of a pseudo-field to an arbitrary source of data which could be computed at runtime. For me it was distances, but it could also be the results of a FunctionQuery. Since there was a mapping of name to data, it was possible to include or exclude the FieldValueSources from adding their information to the search results through the fl parameter.

Dang, you know it's bad when you wake up in the morning and the first thing that comes into your head is what the interface should look like for some new feature in Solr.

Alas, having just finished SOLR-1297, I think we should simply make the &fl parameter be able to parse functions and, if need be, they can be materialized/executed as they are being retrieved by the Writer (using SOLR-1650 if implemented).

Grant Ingersoll
added a comment - 13/Dec/09 11:45 Dang, you know it's bad when you wake up in the morning and the first thing that comes into your head is what the interface should look like for some new feature in Solr.
Alas, having just finished SOLR-1297 , I think we should simply make the &fl parameter be able to parse functions and, if need be, they can be materialized/executed as they are being retrieved by the Writer (using SOLR-1650 if implemented).
Thus, the interface for this would be:
&fl=sum(x, y),id,a,b,c,score
or
&fl=id,sum(x, y),score
&fl=*,sum(x, y),score
So, the output would be:
...
<str name= "id" >foo</str>
< float name= "sum(x,y)" >40</ float >
< float name= "score" >0.343</ float >
...

we should also let search components add extra fields to the document.

I think we could handle this via the ResponseBuilder by storing an <id, <name, value>> pairing in a map that the ResponseWriter could then consult when it needs it as it's streaming out the results. Tricky part is what to do when there are no ids, I suppose.

Grant Ingersoll
added a comment - 13/Dec/09 11:47 we should also let search components add extra fields to the document.
I think we could handle this via the ResponseBuilder by storing an <id, <name, value>> pairing in a map that the ResponseWriter could then consult when it needs it as it's streaming out the results. Tricky part is what to do when there are no ids, I suppose.

I certainly can. I hadn't thought about having a function as an fl parameter value, but that makes alot of sense and I can support that through my work as well. I'll work on extracting the code today and will get a patch here ASAP.

Chris Male
added a comment - 13/Dec/09 12:44 Hi Grant,
I certainly can. I hadn't thought about having a function as an fl parameter value, but that makes alot of sense and I can support that through my work as well. I'll work on extracting the code today and will get a patch here ASAP.

I certainly can. I hadn't thought about having a function as an fl parameter value, but that makes alot of sense and I can support that through my work as well. I'll work on extracting the code today and will get a patch here ASAP.

As far as I recall the fact the functions are specified in the fl parameter should still work with the FieldValueSource as it is at the moment. The registry enables you to register any value for any string key, in this case the string key is the function.

Uri Boness
added a comment - 13/Dec/09 13:07
I certainly can. I hadn't thought about having a function as an fl parameter value, but that makes alot of sense and I can support that through my work as well. I'll work on extracting the code today and will get a patch here ASAP.
As far as I recall the fact the functions are specified in the fl parameter should still work with the FieldValueSource as it is at the moment. The registry enables you to register any value for any string key, in this case the string key is the function.

Yup the functions as fl parameters works straight away with the FieldValueSource so no changes required there. I will first chuck up a patch without SOLR-1644 so that it can be immediately reviewed, then I'll dive into how to update it to 1644 and will create another patch then.

Chris Male
added a comment - 13/Dec/09 13:17 Hi Uri,
Yup the functions as fl parameters works straight away with the FieldValueSource so no changes required there. I will first chuck up a patch without SOLR-1644 so that it can be immediately reviewed, then I'll dive into how to update it to 1644 and will create another patch then.

Attaching the patch taken from my SOLR-773 patch. Adds in a FieldValueSource and FieldValueSourceRegistry, changes the SolrIndexSearcher to use FieldValueSources when building a document, and hooks this process into the ReponseWriters.

Chris Male
added a comment - 13/Dec/09 13:39 Attaching the patch taken from my SOLR-773 patch. Adds in a FieldValueSource and FieldValueSourceRegistry, changes the SolrIndexSearcher to use FieldValueSources when building a document, and hooks this process into the ReponseWriters.

Chris Male
added a comment - 13/Dec/09 14:21 Attached new patch which changes the names from FieldValueSource to FieldValues, and FieldValueSourceRegistry to FieldValuesRegistry, to avoid confusion with ValueSource.

Think scalability... there should be a way to keep things streamable. Some people will want to retrieve values for many documents (10K, 100K, or their whole index). But of course there should be a way for a component to simply add values calculated all at once too.

For performance, providers of field values should be able to operate on multiple documents at once. For example, providers may want to sort big blocks of docids and access in docid order for better performance (important for anything that accesses the index). A value provider that needs to access another system would want to send multiple IDs in a batch.

Field value providers should be given context, including optionally the set of fields for the current document, and probably the request and response objects

Perhaps this should be more generalized in that the value provider be a document mutator - it should be able to also change or remove other fields. I believe this would also allow stuff like per-field security. Field value providers should also be able to add multiple fields - it may not know ahead of time what extra fields a document has.

should work with highlighting... this way people don't have to store large text fields if they already have them in another system.

keep in mind that some people believe that derived fields (or meta fields) don't belong in the same place as other stored fields. I think it probably depends on the exact usecase though.

I'm not sure if SolrIndexSearcher is the right place for this or not though - perhaps its document() method should stick to just the stored fields?

Think about how to name these fields nicer names... perhaps this could even include the "select as" ability to rename fields.
One thought: use an optional '=' or use the "AS" syntax
fl=foo=bar,dist=gdist(10,20,loc) or
fl=foo AS bar, gdist(10,20,loc) AS dist (more familiar to DB people?)
Another option for providing names that would only work with queries/function queries would be local params:
fl=
{!key=dist}

gdist(10,20,loc)
but that only works for queries so it's not as flexible

If we use the fl syntax for including function queries, then we should consider providing the ability to use multiple "fl" params. This will make it easier for clients who want to tack something on w/o modifying other params.
If we provide multiple fl params, then an alternate way to specify aliases could be:
fl.dist=gdist(10,20,loc)

fl=foo is ambiguous... do we mean a function query or the field?... perhaps if it's a bare field name, then we treat it as a field unless it has localparams?
fl=
{!func}

Yonik Seeley
added a comment - 13/Dec/09 15:11 A few comments and random thoughts on this feature in general:
Think scalability... there should be a way to keep things streamable. Some people will want to retrieve values for many documents (10K, 100K, or their whole index). But of course there should be a way for a component to simply add values calculated all at once too.
For performance, providers of field values should be able to operate on multiple documents at once. For example, providers may want to sort big blocks of docids and access in docid order for better performance (important for anything that accesses the index). A value provider that needs to access another system would want to send multiple IDs in a batch.
Field value providers should be given context, including optionally the set of fields for the current document, and probably the request and response objects
Perhaps this should be more generalized in that the value provider be a document mutator - it should be able to also change or remove other fields. I believe this would also allow stuff like per-field security. Field value providers should also be able to add multiple fields - it may not know ahead of time what extra fields a document has.
should work with highlighting... this way people don't have to store large text fields if they already have them in another system.
keep in mind that some people believe that derived fields (or meta fields) don't belong in the same place as other stored fields. I think it probably depends on the exact usecase though.
I'm not sure if SolrIndexSearcher is the right place for this or not though - perhaps its document() method should stick to just the stored fields?
Think about how to name these fields nicer names... perhaps this could even include the "select as" ability to rename fields.
One thought: use an optional '=' or use the "AS" syntax
fl=foo=bar,dist=gdist(10,20,loc) or
fl=foo AS bar, gdist(10,20,loc) AS dist (more familiar to DB people?)
Another option for providing names that would only work with queries/function queries would be local params:
fl=
{!key=dist}
gdist(10,20,loc)
but that only works for queries so it's not as flexible
If we use the fl syntax for including function queries, then we should consider providing the ability to use multiple "fl" params. This will make it easier for clients who want to tack something on w/o modifying other params.
If we provide multiple fl params, then an alternate way to specify aliases could be:
fl.dist=gdist(10,20,loc)
fl=foo is ambiguous... do we mean a function query or the field?... perhaps if it's a bare field name, then we treat it as a field unless it has localparams?
fl=
{!func}
foo

I like the idea of giving the providing a broader context (document, request, response). This will also allow them to operate on multiple documents in the response (whether it's the docset or the doclist).

One thing to take into consideration here is that one you introduce dependency between the fields, there must be a way to determine the ordering of the providers (as one provider might depend on fields generated by another provider).

as for the "<field> AS <alias>" syntax. I think this should be consistent with the work in SOLR-1351 which is currently based on localparams. Perhaps there should be a common approach to handle aliases in requests.

I think that the proper approach is to separate the stored fields from other "fields.. perhaps even put it in a separate "meta-data" section under the document. But once you do that, again, for the sake of consistency, it would also be wise not to include these fields/functions in the "fl" parameter. So the "fl" parameter will refer to fields, and another parameter "meta" will refer to meta-data values.

fl={!func}foo

+1 or even func:foo. Then you can have things like "url:<url>" or "file:<file path>" or even "db:<db alias + field>"

Uri Boness
added a comment - 13/Dec/09 15:46 I like the idea of giving the providing a broader context (document, request, response). This will also allow them to operate on multiple documents in the response (whether it's the docset or the doclist).
One thing to take into consideration here is that one you introduce dependency between the fields, there must be a way to determine the ordering of the providers (as one provider might depend on fields generated by another provider).
as for the "<field> AS <alias>" syntax. I think this should be consistent with the work in SOLR-1351 which is currently based on localparams. Perhaps there should be a common approach to handle aliases in requests.
I think that the proper approach is to separate the stored fields from other "fields.. perhaps even put it in a separate "meta-data" section under the document. But once you do that, again, for the sake of consistency, it would also be wise not to include these fields/functions in the "fl" parameter. So the "fl" parameter will refer to fields, and another parameter "meta" will refer to meta-data values.
fl={!func}foo
+1 or even func:foo. Then you can have things like "url:<url>" or "file:<file path>" or even "db:<db alias + field>"

Noble Paul
added a comment - 13/Dec/09 16:10 perhaps even put it in a separate "meta-data" section under the document
This has been discussed earlier. The meta section is not a clean idea. We should put them as normal fields.
Instead of inventing a new syntax , let us use the local params syntax. we do not have to try to have any similarity with SQL .

fl=foo is ambiguous... do we mean a function query or the field?... perhaps if it's a bare field name, then we treat it as a field unless it has localparams?
fl=

Unknown macro: {!func}

foo

It is and isn't ambiguous, right? The result should be the same in that the value for that field is loaded (although I suppose it has implications that it would now be possible to load non-stored, single valued fields if we treat it as a function). FWIW, for the sort by function case, I checked to see if it is a field first, then a function, then puke.

Grant Ingersoll
added a comment - 23/Dec/09 15:19 fl=foo is ambiguous... do we mean a function query or the field?... perhaps if it's a bare field name, then we treat it as a field unless it has localparams?
fl=
Unknown macro: {!func}
foo
It is and isn't ambiguous, right? The result should be the same in that the value for that field is loaded (although I suppose it has implications that it would now be possible to load non-stored, single valued fields if we treat it as a function). FWIW, for the sort by function case, I checked to see if it is a field first, then a function, then puke.

I think that the proper approach is to separate the stored fields from other "fields.. perhaps even put it in a separate "meta-data" section under the document. But once you do that, again, for the sake of consistency, it would also be wise not to include these fields/functions in the "fl" parameter. So the "fl" parameter will refer to fields, and another parameter "meta" will refer to meta-data values.

I think they should be inline, as they are just values associated with a document. I think putting it in some other list is sticking too literally to what Lucene calls a field, which I don't think Solr has to do that. One could easily imagine a Solr component that brought in a database or other storage repository for supplementary fields and it should all be seamless to the client.

If we step back and think about the use case for this functionality it is that one wants the output of the function closely associated with the document. I don't want to have to go look it up in some other list while I am iterating over my results when all the other values I'm displaying/using are right there associated with the document. That being said, it could be useful to add an attribute that indicates it is a generated name, but in reality, that is inferred by the field name anyway, as in:

Grant Ingersoll
added a comment - 23/Dec/09 15:30 I think that the proper approach is to separate the stored fields from other "fields.. perhaps even put it in a separate "meta-data" section under the document. But once you do that, again, for the sake of consistency, it would also be wise not to include these fields/functions in the "fl" parameter. So the "fl" parameter will refer to fields, and another parameter "meta" will refer to meta-data values.
I think they should be inline, as they are just values associated with a document. I think putting it in some other list is sticking too literally to what Lucene calls a field, which I don't think Solr has to do that. One could easily imagine a Solr component that brought in a database or other storage repository for supplementary fields and it should all be seamless to the client.
If we step back and think about the use case for this functionality it is that one wants the output of the function closely associated with the document. I don't want to have to go look it up in some other list while I am iterating over my results when all the other values I'm displaying/using are right there associated with the document. That being said, it could be useful to add an attribute that indicates it is a generated name, but in reality, that is inferred by the field name anyway, as in:
<doc>
<field name= "pow(foo,2)" >64</field>
<field name= "foo" >8</field>
</doc>
I'd even argue that highlighter results should be inline, too, but that is a different issue and a bigger can of worms since it has a well used API already.

should work with highlighting... this way people don't have to store large text fields if they already have them in another system. I'm not sure if SolrIndexSearcher is the right place for this or not though - perhaps its document() method should stick to just the stored fields?

From what I can see the Highlighter pulls documents from the SolrIndexSearcher as well through document() so the patch should already support highlighting. If we move the process away from the SolrIndexSearcher, which I understand the case for, then we need to move all components away from using document(), otherwise the same document could be represented in different ways depending on whether its retrieved via the #document() one time or via whatever way we build. Equally, we need custom components to do the same.

I do like the idea of changing to a DocumentMutator which is given a context and is able to add/remove fields. This will then work seamlessly with having the values inline with the documents.

Chris Male
added a comment - 23/Dec/09 15:45 Just a couple of thoughts about the implementation of this:
should work with highlighting... this way people don't have to store large text fields if they already have them in another system. I'm not sure if SolrIndexSearcher is the right place for this or not though - perhaps its document() method should stick to just the stored fields?
From what I can see the Highlighter pulls documents from the SolrIndexSearcher as well through document() so the patch should already support highlighting. If we move the process away from the SolrIndexSearcher, which I understand the case for, then we need to move all components away from using document(), otherwise the same document could be represented in different ways depending on whether its retrieved via the #document() one time or via whatever way we build. Equally, we need custom components to do the same.
I do like the idea of changing to a DocumentMutator which is given a context and is able to add/remove fields. This will then work seamlessly with having the values inline with the documents.
Should I go ahead and mockup a patch for something like this?

I'm not sure if SolrIndexSearcher is the right place for this or not though - perhaps its document() method should stick to just the stored fields?

Both Chris' patch here and Noble's on SOLR-1566 take the approach of modifying SolrIndexSearcher.doc() for part of the solution. Not saying this is right or wrong, but I think it would be useful to document here the rationale about why not to do it. Is it just b/c that method is expected to do, more or less, what the Lucene IndexSearcher does?

Grant Ingersoll
added a comment - 23/Dec/09 17:13 I'm not sure if SolrIndexSearcher is the right place for this or not though - perhaps its document() method should stick to just the stored fields?
Both Chris' patch here and Noble's on SOLR-1566 take the approach of modifying SolrIndexSearcher.doc() for part of the solution. Not saying this is right or wrong, but I think it would be useful to document here the rationale about why not to do it. Is it just b/c that method is expected to do, more or less, what the Lucene IndexSearcher does?

I think they should be inline, as they are just values associated with a document. I think putting it in some other list is sticking too literally to what Lucene calls a field, which I don't think Solr has to do that. One could easily imagine a Solr component that brought in a database or other storage repository for supplementary fields and it should all be seamless to the client.

I definitely agree that one shouldn't see a field in Solr as a field in Lucene. That said, I think do have a tendency to see a field in Solr as somehow bound to the Solr schema.

One thing to notice is that eventually we end up with the same discussion regarding this feature in the context of different issues, let it be highlighting or field collapsing. In some cases it feel just "right" to return the data as a field in a document, in other places it feels "right" to have as something else. It is true that when you interact with solr directly (specially if you do it manually) you certainly know what queries you send, what functions you request and what you should expect in the result. But from experience, a lot of times you try to automate things a bit and creating a well structured and descriptive protocols is the safe way to enable that.

I don't want to have to go look it up in some other list while I am iterating over my results when all the other values I'm displaying/using are right there associated with the document.

Having a sub-section under each documents still associates it with the document. The way I see it, It's like OOP... you can have a Person class that holds all the information of the person it it as primitive fields, or you can group related data, like address info, int a separate Address class.

That being said, it could be useful to add an attribute that indicates it is a generated name

That's one way to group fields together, but if you're already doing that, then why not go all the way? If you need to distinguish between generated and non-generated names, why not make it simpler and just separate the two in a different list? (To continue the analogies line I started above ) it's like XML, you can have a single level hierarchy were each element defines attributes to relate it to other elements, but a more suitable solution would just be to group all related elements under one parent element.

I'd even argue that highlighter results should be inline, too, but that is a different issue and a bigger can of worms since it has a well used API already.

In some cases it might be (well it just is) more appropriate to have the highlighting inlined. In other cases it might not be possible, specially with some of the latest requests to have highlighting functionality available for arbitrary text loaded from anywhere (which I believe will lead for a highlighting component/requestHandler that will be independent of the query component).

Not saying this is right or wrong, but I think it would be useful to document here the rationale about why not to do it. Is it just b/c that method is expected to do, more or less, what the Lucene IndexSearcher does?

I guess so... I guess SolrIndexSearcher is in fact a Lucene IndexSearcher which is the source for this association. In some ways I think it's also relates a bit to the response structure (not directly though, but conceptually)... if the IndexSearcher represents Lucene and the document contains fields coming from other sources as well, perhaps this functionality of gathering all these fields (/metadata ) should be done in a higher level where SolrIndexSearcher just serves as on "field source". The main reason why Chris's patch puts this functionality in the doc() method of the SolrIndexSearcher is simply because it's the easiest and the simplest solution right now... and I don't thing there's nothing wrong with that... simple is good! Even with this solution as it is, the "field sources" are still abstracted away in the form of a "FieldValues" or "DocumentMutator", so architecture-wise I don't see leaving it as is will compromise anything.

Uri Boness
added a comment - 23/Dec/09 22:32 I think they should be inline, as they are just values associated with a document. I think putting it in some other list is sticking too literally to what Lucene calls a field, which I don't think Solr has to do that. One could easily imagine a Solr component that brought in a database or other storage repository for supplementary fields and it should all be seamless to the client.
I definitely agree that one shouldn't see a field in Solr as a field in Lucene. That said, I think do have a tendency to see a field in Solr as somehow bound to the Solr schema.
One thing to notice is that eventually we end up with the same discussion regarding this feature in the context of different issues, let it be highlighting or field collapsing. In some cases it feel just "right" to return the data as a field in a document, in other places it feels "right" to have as something else. It is true that when you interact with solr directly (specially if you do it manually) you certainly know what queries you send, what functions you request and what you should expect in the result. But from experience, a lot of times you try to automate things a bit and creating a well structured and descriptive protocols is the safe way to enable that.
I don't want to have to go look it up in some other list while I am iterating over my results when all the other values I'm displaying/using are right there associated with the document.
Having a sub-section under each documents still associates it with the document. The way I see it, It's like OOP... you can have a Person class that holds all the information of the person it it as primitive fields, or you can group related data, like address info, int a separate Address class.
That being said, it could be useful to add an attribute that indicates it is a generated name
That's one way to group fields together, but if you're already doing that, then why not go all the way? If you need to distinguish between generated and non-generated names, why not make it simpler and just separate the two in a different list? (To continue the analogies line I started above ) it's like XML, you can have a single level hierarchy were each element defines attributes to relate it to other elements, but a more suitable solution would just be to group all related elements under one parent element.
I'd even argue that highlighter results should be inline, too, but that is a different issue and a bigger can of worms since it has a well used API already.
In some cases it might be (well it just is) more appropriate to have the highlighting inlined. In other cases it might not be possible, specially with some of the latest requests to have highlighting functionality available for arbitrary text loaded from anywhere (which I believe will lead for a highlighting component/requestHandler that will be independent of the query component).
Not saying this is right or wrong, but I think it would be useful to document here the rationale about why not to do it. Is it just b/c that method is expected to do, more or less, what the Lucene IndexSearcher does?
I guess so... I guess SolrIndexSearcher is in fact a Lucene IndexSearcher which is the source for this association. In some ways I think it's also relates a bit to the response structure (not directly though, but conceptually)... if the IndexSearcher represents Lucene and the document contains fields coming from other sources as well, perhaps this functionality of gathering all these fields (/metadata ) should be done in a higher level where SolrIndexSearcher just serves as on "field source". The main reason why Chris's patch puts this functionality in the doc() method of the SolrIndexSearcher is simply because it's the easiest and the simplest solution right now... and I don't thing there's nothing wrong with that... simple is good! Even with this solution as it is, the "field sources" are still abstracted away in the form of a "FieldValues" or "DocumentMutator", so architecture-wise I don't see leaving it as is will compromise anything.

Both Chris and me are trying to achieve more or less the same thing. just that SOLR-1566 is a bit more ambitious in scope.

@Uri , I would request you to take a look at the patch in SOLR-1566 also .

I guess, all of us agree upon the fact that we need a generic way to add non-Lucene fields to a SolrDocument . The fields could be single-valued/multivalued . Say a function returns a List<int> should be valid too.(I even say it could even be a NamedList) . This is useful for a lot of usecases.

As long as we can achieve this functionality in a performant way, it is fine. Let us converge our efforts and bring this to a resolution ASAP.

Noble Paul
added a comment - 24/Dec/09 04:16 Both Chris and me are trying to achieve more or less the same thing. just that SOLR-1566 is a bit more ambitious in scope.
@Uri , I would request you to take a look at the patch in SOLR-1566 also .
I guess, all of us agree upon the fact that we need a generic way to add non-Lucene fields to a SolrDocument . The fields could be single-valued/multivalued . Say a function returns a List<int> should be valid too.(I even say it could even be a NamedList) . This is useful for a lot of usecases.
As long as we can achieve this functionality in a performant way, it is fine. Let us converge our efforts and bring this to a resolution ASAP.

Hoss Man
added a comment - 27/May/10 23:05 Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...
http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.
A unique token for finding these 240 issues in the future: hossversioncleanup20100527

Koji Sekiguchi
added a comment - 22/Jun/11 13:27 Hi, I'm using solr example data on trunk.
If I post q=ipod&fl=score,price , Solr returns score and price as expected.
But if I post q=ipod&fl=score,log(price) , Solr returns score, the value of log(price) and rest of all fields.