Activity

Took a while to pinpoint the reason - lines 173-181 of StandardFacetsAccumulator.
In the mentioned lines, a 'merge' is performed over categories which matched the request, but reside on different partitions.

Partitions are an optimization which limit the RAM requirements per query to a constant, rather than linear to the taxonomy size (could be millions of categories). The taxonomy is virtually "splitted" into partitions of constant size, a top-k is heaped from each partition, and all those top-k results are being merged to a global top-k list

The proposed solution of changing the hashCode and equals so that the same request will have two hashCodes and will not be equal to itself is very likely to break other parts of the code.

Perhaps such cases could be prevented all together? e.g throwing an exception when the (exact) same request is added twice.
Is that a reasonable solution? Are there cases where it is necessary to request the same path twice?
Please note that a different count, depth, path etc - makes a different request, so requesting "author" with count 10 and count 11 makes different requests - which are handled simultaneously correctly in current versions.

Gilad Barkai
added a comment - 09/Oct/12 06:03 Nice catch!
Took a while to pinpoint the reason - lines 173-181 of StandardFacetsAccumulator.
In the mentioned lines, a 'merge' is performed over categories which matched the request, but reside on different partitions.
Partitions are an optimization which limit the RAM requirements per query to a constant, rather than linear to the taxonomy size (could be millions of categories). The taxonomy is virtually "splitted" into partitions of constant size, a top-k is heaped from each partition, and all those top-k results are being merged to a global top-k list
The proposed solution of changing the hashCode and equals so that the same request will have two hashCodes and will not be equal to itself is very likely to break other parts of the code.
Perhaps such cases could be prevented all together? e.g throwing an exception when the (exact) same request is added twice.
Is that a reasonable solution? Are there cases where it is necessary to request the same path twice?
Please note that a different count, depth, path etc - makes a different request, so requesting "author" with count 10 and count 11 makes different requests - which are handled simultaneously correctly in current versions.

In my case the final user can specify his/her default query to show once he/she is logged into the system. This issue came up when somebody define the same facet request using one as filter and the other only as a pure count request.

I did not find a clean way to fix it looking through the code, but i'm trying the current "ugly" solution where "counter" is the request index.

Rodrigo Vega
added a comment - 09/Oct/12 11:42 In my case the final user can specify his/her default query to show once he/she is logged into the system. This issue came up when somebody define the same facet request using one as filter and the other only as a pure count request.
I did not find a clean way to fix it looking through the code, but i'm trying the current "ugly" solution where "counter" is the request index.
public class CustomCountFacetRequest extends CountFacetRequest {
private int hashCode;
public CustomCountFacetRequest(CategoryPath path, int num) {
this (path, num, 0);
}
public CustomCountFacetRequest(CategoryPath path, int num, int counter) {
super (path, num);
hashCode = super .hashCode() * counter;
}
@Override
public int hashCode() {
return hashCode;
}
@Override
public boolean equals( Object o) {
if (o instanceof CustomCountFacetRequest) {
CustomCountFacetRequest that = (CustomCountFacetRequest) o;
return that.hashCode == this .hashCode && super .equals(o);
}
return false ;
}
}
I didn't find collateral effects on this solution yet, however i'm worried with your comments about breaking other parts of the code.
I'm not sure if throwing an exception is the best solution, but at least the response will be consistent.

The same category can be set as a filter and as a request without them colliding - a filter is not correlated or dependent on a facet request.
facets filters are done at the query level which affects the result set, while the facetRequest defines which categories to retrieve out of the result set.
I probably miss something here

Gilad Barkai
added a comment - 09/Oct/12 14:16 - edited The same category can be set as a filter and as a request without them colliding - a filter is not correlated or dependent on a facet request.
facets filters are done at the query level which affects the result set, while the facetRequest defines which categories to retrieve out of the result set.
I probably miss something here

Yes, I know that but i need a generic way to handle the facet requests during search preparation stage. For that reason I have a FacetParams object with the following properties: int limit, String[] path, boolean filter

Rodrigo Vega
added a comment - 09/Oct/12 14:38 Yes, I know that but i need a generic way to handle the facet requests during search preparation stage. For that reason I have a FacetParams object with the following properties: int limit, String[] path, boolean filter
Then I can have some peace of code like this:
// Query is a bean that wraps a lucene query just to make easy the integration with our UI. REal lucene query is // handle by "q"
if (query.isFaceted())
for (FacetParams facet : query.getFacet()) {
CategoryPath c = new CategoryPath(facet.getPath());
facetSearchParams.addFacetRequest( new CountFacetRequest(c, facet.getLimit()));
if (facet.isFilter())
q = DrillDown.query(q, c);
}
... preapre collectors and finally search
This is because the user can create its own query using some kind of wizard. So I have two options:
Support this feature
Avoid configurations with this kind of cases.
It is not a really big problem it is mostly about how to keep my code as neat as I can . I can't ask you to solve my code issues

Well solve you code issues no But the current code is indeed broken by the issue you raised - and that should be fixed.
I re examined the code, and I think the different hashcode you presented will work - though please note it will consume some extra CPU, as the same request will be handled twice (that's the heap to figure out the top-k of the request) to create separate FacetResults for each request.

Gilad Barkai
added a comment - 09/Oct/12 14:48 Well solve you code issues no But the current code is indeed broken by the issue you raised - and that should be fixed.
I re examined the code, and I think the different hashcode you presented will work - though please note it will consume some extra CPU, as the same request will be handled twice (that's the heap to figure out the top-k of the request) to create separate FacetResults for each request.

Rodrigo Vega
added a comment - 09/Oct/12 14:58 I though there is any kind of cache to avoid doing same work twice or at least some part of it. However I think that is a different issue/feature/discussion. Thanks for your time.

Gilad Barkai
added a comment - 11/Dec/12 19:54 Proposed fix - In StandardFacetsAccumulator , guard against handling and merging the same request more than once.
Also a matching test is introduced, inspired by previous patch.
Thanks Rodrigo!

Shai Erera
added a comment - 16/Dec/12 19:15 Thanks for the fix Gilad. I did the following:
Fixed a couple of typos.
Updated the patch following the recent changes to FacetSearchParams (no addRequest anymore).
Made the test assert that the results.toString() match, rather than just numValidDecendants.
Added a CHANGES entry.
Committed to trunk and 4x.