'facet.prefix' Based Drill Down

For simple categories, facet.field works fine as-is. However, categorization schemes are frequently organized in a hierarchically-structured scheme, and the user experience for interacting with that taxonomy involves drill down, starting at the top (more general) and whittling the way down (more specific).

This is a basic approach that works well for most use cases and takes advantage of basic Solr faceting parameters by encoding the facet terms at index time.

Flattened Data “breadcrumbs”

In this example, we have documents associated with multiple categories, like Doc#3. We also have documents that are mapped to internal nodes, like Doc#2.

You must perform some index time processing on this flattened data in order to create the tokens needed for a facet.prefix approach. When we index the data we create specially formatted terms that encode the depth information for each node that appears as part of the path, and include the hierarchy separated by a common separator (“depth/first level term/second level term/etc”). We also add additional terms for every ancestors in the original data.

Indexed Terms

Terms Containing Another Term in the Beginning

In case you are indexing terms that may have another term in the beginning, adding a separator at the end of each term helps distinquish these terms:

Doc#1: 0/Books/, 1/Books/Book/
Doc#2: 0/Books/, 1/Books/BookPart/

Then in the query always include a trailing slash, e.g. "facet.prefix = 1/Books/Book/" to avoid matching "1/Books/BookParts".

Initial Query

With this type of index data, we can then go on and query this to get a drill-down. Initially, we can say we want to facet on the category field with the facet.prefix “1/NonFic”: things that are children of NonFic at a depth of 1.

Pivot Facets

Pivot facets are query time constructs that allow arbitrary facet results, but they should be used wisely to avoid performance bottlenecks.

You can think of it as "Decision Tree Faceting" which tells you in advance what the "next" set of facet results would be for a field if you apply a constraint from the current facet results, e.g. "for facet A, the constraints/counts are X/N, Y/M,” and if you were to constrain A by X, then the constraint counts for B would be S/P, T/Q, etc. Another way to think of it is each field is treated as a vector containing the constraint counts for that field, and taking a "cross product" to produce an N-dimensional matrix showing the counts for each permutation.

This feature can be easily applied to hierarchical facets in some cases, particularly those where a particular document only appears at one point in the taxonomy.

Flattened Data “breadcrumbs”

Doc#1: NonFic > Law
Doc#2: NonFic > Sci
Doc#3: NonFic > Sci > Phys

At index time, we split the data into a separate field for each level of the hierarchy.

Multipath hierarchical faceting

Hierarchical faceting with slow startup, low memory overhead and fast response. Distinguishing features as compared to SOLR-64 and SOLR-792 are

Multiple paths per document

Query-time analysis of the facet-field; no special requirements for indexing besides retaining separator characters in the terms used for faceting

Optional custom sorting of tag values

Recursive counting of references to tags at all levels of the output

This is a shell around LUCENE-2369, making it work with the Solr API. The underlying principle is to reference terms by their ordinals and create an index wide documents to tags map, augmented with a compressed representation of hierarchical levels.