Attachments

Activity

The annoying part here is we need more metadata than just "Query" that we use now for a filter.
Unfortunately, SolrIndexSearcher uses List<Query> everywhere.

We could create something like a SolrQuery extends Query that wrapped a normal query and added additional metadata (like cache options). That's a bit messier since we'd have instanceof checks and casts everywhere though.

Another option is to create a SolrQuery class that does not extend Query - hence methods taking List<Query> would now need to take List<SolrQuery>

Yonik Seeley
added a comment - 15/Mar/11 19:39 The annoying part here is we need more metadata than just "Query" that we use now for a filter.
Unfortunately, SolrIndexSearcher uses List<Query> everywhere.
We could create something like a SolrQuery extends Query that wrapped a normal query and added additional metadata (like cache options). That's a bit messier since we'd have instanceof checks and casts everywhere though.
Another option is to create a SolrQuery class that does not extend Query - hence methods taking List<Query> would now need to take List<SolrQuery>
class SolrQuery {
Query q;
QParser qparser;
boolean cache;
...
}
Thoughts?

Hoss Man
added a comment - 15/Mar/11 21:55 why not extend Query? ... it could actually rewrite to the Query it wraps, giving us the best of both worlds.
FWIW: it also seems like it would make sense for this type of syntax/decoration to work with the "q" param (skipping the queryResultCache)

Ryan McKinley
added a comment - 15/Mar/11 23:08 I'm not sure this is related – it could be – I'm looking writing a custom query from:
@Override
public Query getFieldQuery(QParser parser, SchemaField field, String externalVal)
and it would be great to know if this is used as a filter or not – should it include scoring? Are there ways to build the query where parts are cached and some is not?

David Smiley
added a comment - 16/Mar/11 03:58 Heh, me too! I was pondering this last night; I know specific queries will needlessly pollute the cache. I was imagining a syntax such as this: fq=
{!cache=no}
queryhere

Here's a patch that allows one to add cache=false to top level queries (main queries, filter queries, facet queries, etc).

Currently (without this patch) Solr generates the set of documents that match each filter individually (this is so they can be cached and reused).

Adding cache=false to the main query prevents lookup/storing in the query cache. Adding cache=false to any filter query causes the filterCache to not be used. Further, the filter query is actually run in parallel to the main query and any other non-cached filter queries (which can speed things up if the base query or other filter queries are relatively sparse).

There is also an optional "cost" parameter that controls the order in which non-cached filter queries are evaluated so knowledgable users can order less expensive non-cached filters before expensive non-cached filters.

As an additional feature for very high cost filters, if cache=false and cost>=100 and the query implements the PostFilter interface, a Collector will be requested from that query and used to filter documents after they have matched the main query and all other filter queries. There can be multiple post filters, and they are also ordered by cost.

// normal function range query used as a filter, all matching documents generated up front and cached
fq={!frange l=10 u=100}mul(popularity,price)
// function range query run in parallel with the main query like a traditional lucene filter
fq={!frange l=10 u=100 cache=false}mul(popularity,price)
// function range query checked after each document that already matches the query and all other filters. Good for really expensive function queries.
fq={!frange l=10 u=100 cache=false cost=100}mul(popularity,price)

Yonik Seeley
added a comment - 24/Jun/11 02:43 Here's a patch that allows one to add cache=false to top level queries (main queries, filter queries, facet queries, etc).
Currently (without this patch) Solr generates the set of documents that match each filter individually (this is so they can be cached and reused).
Adding cache=false to the main query prevents lookup/storing in the query cache. Adding cache=false to any filter query causes the filterCache to not be used. Further, the filter query is actually run in parallel to the main query and any other non-cached filter queries (which can speed things up if the base query or other filter queries are relatively sparse).
There is also an optional "cost" parameter that controls the order in which non-cached filter queries are evaluated so knowledgable users can order less expensive non-cached filters before expensive non-cached filters.
As an additional feature for very high cost filters, if cache=false and cost>=100 and the query implements the PostFilter interface, a Collector will be requested from that query and used to filter documents after they have matched the main query and all other filter queries. There can be multiple post filters, and they are also ordered by cost.
The frange query (a range over function queries, background here:
http://www.lucidimagination.com/blog/2009/07/06/ranges-over-functions-in-solr-14/
) also now implements PostFilter.
Examples:
// normal function range query used as a filter, all matching documents generated up front and cached
fq={!frange l=10 u=100}mul(popularity,price)
// function range query run in parallel with the main query like a traditional lucene filter
fq={!frange l=10 u=100 cache= false }mul(popularity,price)
// function range query checked after each document that already matches the query and all other filters. Good for really expensive function queries.
fq={!frange l=10 u=100 cache= false cost=100}mul(popularity,price)