Hierarchical faceted search example with Solr

Question:

Question

Where can I find a complete example that shows how hierarchical faceted search works from indexing the documents to retrieving search results?

My research so far

Stackoverflow has a few posts, but all of them only address certain aspects of hierarchical faceted search; therefore, I wouldn't consider them to be duplicates. I'm looking for a complete example to understand it. I keep missing the last query where the aggregations work.

This tags list is still flat, but at least location/europe = 4 would be correctly aggregated, but currently it is not. I keep getting location/europe = 1 because it's only set for Alice and Bob's Norway and Sweden are not aggregated to also count towards Europe.

Ideas

I might need to use facet.pivot, but I don't know how.

I might need to use facet.prefix, but I don't know how.

Versions

Solr 5.1.0

Windows 7

Answer:

You can get all of your aggregations to be populated if you push them into the index in stages. If Bob is from Norway, you might populate up to three values in your facet field:

location
location/Europe
location/Europe/Norway

(As an alternate design, you might have a hair color field separate from the location field, and then "location" would never need to be populated in the field itself.)

Then your results are still flat but your aggregated totals are present. At that point, you will need to do some programmatic work with the result set to create a nested data structure built by splitting all of the values on your separator character (/ in this case). Once you have a nested data structure, then displaying it hierarchically should be manageable. It's hard to go into detail about this part of the implementation because your nested data structure and display will depend heavily on your development environment.

Another, somewhat risky, option to avoid adding repetitive entries into your Solr facet field is to add only the value you're using now (e.g. location/Europe/Norway), but to sum the leaf totals as your iterate through the facet list and build your nested data structure. The risk there is that if a person is genuinely associated with multiple countries in Europe, then you might get an inflated total for the higher level location/Europe. I have chosen in my own projects to populate the separate values, as above. Even though they seem redundant, the aggregate totals end up being more accurate.

(As usual in Solr, this is only one of quite a few ways of doing things. This model works best for systems with a manageable number of total leaves, where it makes sense to retrieve all of the facet values up front and not have to make additional drill-down queries.)

A pivoting option

Solr facet pivoting can return a hierarchically-structured result directly from Solr, but runs the risk of creating false connections between values in certain situations.

This way you will still get the mismatched pivot values, but you can choose to block any country-level facet values that don't have a prefix matching their continent. For this to be an issue, a multivalued field in the pivot must have values associated with values appearing later in the same pivot. If you are not expecting to have multiple values for these fields in a single record or if your values don't have a strong association (i.e. specific parentage), pivot facets can be an ideal solution. But in some cases, the pivot facet's disassociation between values in the included fields can create a prohibitive mess.

Related:

solr,lucene
There is a field in my schema 'fullText' which is of the 'text_en' type, and multivalued. The term 'tests' is in the fullText field in one document. In solr, when I try to search using the word 'test', with the standard lucene parser with minimal distance 1, its returning the...

solr,solrcloud,inverted-index
solr uses inverted index to find the document from the indexed "terms". but what I wonder is that - is there any approach to know all of the terms which refer to a specific documents? thanks...

search,solr,lucene,full-text-search,hibernate-search
I'm working with Hibernate Search for months now, but still I'm not able to digest the relevance it brings. I'm overall satisfied with the results it returns, but even simplest test does not satisfy my expectation. First test was using the term frequency(tf). Data: word word word word word word...

solr,lucene,multicore,sharding,solrcloud
I have a standalone Solr instance with 4 different cores working fine using the embedded Jetty server. I configured the cores for v4.10.3 but since I moved to v5.1 and all seems to work fine without any changes. Before going into production, I need to set it up as a...

php,apache,search,drupal,solr
Hello I am trying to run Solr on a Tomcat and have an exception like org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: directory '/var/lib/solr/data/index' does not exist Maybe anyone has some trouble like I do?...

solr,solr5,banana
I'm currently working on a project on which I would like to index several data sources (Oracle and HBase) into Solr for full text search. Additionally, I want to be able to visualize the data I index into Solr. I'm still evaluating on whether to use Banana or Hue for...

solr,solrnet
Is there a way to tell SOLR to search for (for example) 80% of the phrase "term1 term2 term3 term4" will yeild documents with at least 3 terms. Extra question - if such logic exists - will it work with proximity : "term1 term2 term3 term4"~15 specifically, tried to do...

xml,solr
I am trying to use schema.xml with the latest version of Solr (5.1.0). It seems that by default Solr 5.1.0 uses managed schema, but I would like to use schema.xml for a specific collection. So I create a new collection (using solr create -c my_collection on windows and copy schema.xml...

java,indexing,solr,lucene,full-text-search
I have worked upon Lucene before and now moving towards Solr. The problem is that I am not able to do Indexing on Solr as fast as Lucene can do. My Lucene Code: public class LuceneIndexer { public static void main(String[] args) { String indexDir = "/home/demo/indexes/index1/"; IndexWriterConfig indexWriterConfig =...

sql,database,web-applications,solr,nosql
I'm in the planning phase of developing a very tag heavy website. Everything will essentially be associated with tags and the entire site would be based on searching these tags. Now, I've been thinking a lot about going the nosql route here, since from what I read and understand, it...

solr,ranking,riak,leaderboard
I am currently researching databases for a scalable game backend. Riak looks very nice from an operational point of view. I can easily see how to model user and game data and statistics in Riak. But I have trouble with leaderboards/ranking lists. Assuming we have millions of players and the...

solr,jvm
I have a Solr 5.0.0 in production with a custom heap size like this SOLR_JAVA_MEM="-Xms2g -Xmx2g" When I tried to migrate to Solr 5.1.0 with the same configuration and start the server it returned a OutOfMemoryError. Looking to the Solr API I saw that the heap size was set to...

solr,dspace,oai
After configuring my DSpace server, its working correctly but when I look at the OAI identify page (http://repositorio.puce.edu.ec/oai/request?verb=Identify) so we can be harvested, it says that the repository is localhost instead of my URL. I investigated and found out that to update this, I have to run this command: dspace/bin/dspace...

solr,cassandra,datastax,datastax-enterprise
Im using Datastax 4.6. My solr client queries data by using _uniqueKey. From version 4.6 the limitation about using simple primary key is removed. How can i configure solr or create table in cassandra, so that I receive in solr response information about synthetic key _uniqueKey. There is no problem...

mysql,ruby-on-rails,solr,sunspot
I am trying to use sunspot solr for searching with Rails 4 and mysql. I defined a searchable block in my model(eg XYZ): searchable do text :name, :stored => true string :id, :stored => true end I just want to search in "name". The "id" is the primary key. There...

sql,view,solr
Let me first give you an example. I have two tables -table1 and table2. table1 has a field id_table2, which is a foreign key and references one of the fields in table2. So, when I want to scan table1, I make a query like: SELECT t1.attr_1_, t1.attr_2_, t2.attr_3_ FROM table1...

java,indexing,solr,lucene,solrj
I have downloaded solr 5.2.0 and have started using $solr_home/bin/solr start The Logs stated: Waiting to see Solr listening on port 8983 [/] Started Solr server on port 8983 (pid=17330). Happy searching! Then I visited http://localhost:8983/solr and created a new core using Core Admin / new Core as Core1 (...

solr,dataimporthandler,data-import
Currently i have a Solr core, which is importing data from multiple entities, i.e 2 different MySQL tables. I have to import data in the same core through 3rd entity which is another core in the same Solr Database. I found a documentation on many different sites which were guiding...

django,solr,django-haystack
I have a search index that I have created using Solr. I want to add individual django objects to the search index. To remove objects from the solr database we use remove_object. some = SomFooModel.objects.get(pk=1) foo = FooIndex() foo.remove_object(some) #This works To add it, is there something like add_object or...

solr,cassandra,datastax-enterprise
One of the tables inside our Cassandra (DSE 4.7) Cluster contains south of 15 billion records. With the number of servers we have - it would be impossible to index them all with Solr. So, is it possible to somehow index the data partially/sample and/or start indexing and then "pause"...

search,indexing,solr,levenshtein-distance
Lets say I have my list of ingredients: {'potato','rice','carrot','corn'} and I want to return lists from a database that are most similar to mine: {'beans','potato','oranges','lettuce'}, {'carrot','rice','corn','apple'} {'onion','garlic','radish','eggs'} My query would return this first: {'carrot','rice','corn','apple'} I've used Solr, and have looked at CloudSearch, ElasticSearch, Algolia, Searchify and Swiftype. These engines only...

java,solr,lucene,config,solrcloud
I have a custom class extending UpdateRequestProcessorFactory doing some work on a document when it gets added to the index. This was working fine in v4.10.3 in standalone Solr. I moved to SolrCloud v5.2 and it throws this error when adding the Collection (node): ERROR - 2015-06-14 12:25:11.071; [ docs_shard1_replica1]...

solr,solrcloud,synonym,stop-words
I am implementing Solr Cloud for the first time. I've worked with normal Solr and have that down pretty well, but I'm not finding a lot on what you can and can't do with Solr Cloud. So my question is about Managed Resources. I know you can CRUD stop words...

solr
I am quite new to solr as such, and have set up everything as per the example, and it all works fine. However, I have one nagging issue, for which I do not seem to find a solution for. So, normally, I do the following using the SimplePostTool and it...

java,apache,solr,lucene,autosuggest
I am using solr 5.1. I am trying to configure multiple suggester definition in Solr search component according to Apache solr wiki. I have configured single suggester perfectly and it works perfect but whenever I try to configure multiple suggester it gives me following errors java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:190) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)...

solr,typo3,typoscript,typo3-6.2.x
I have a small question about TYPO3 solr facets.At present in my website I have 6 different indexing configuration available. Two of them are custom extension table's and one is tt_news and rest of the 3 are pages table with some custom condition. I managed to add this using additionalWhereClause...

mysql,solr,tika
Solr, more specifically Tika, is having some problems finding my file whose filepath is retrieved from a database. Whenever I go to index it logs errors saying that this can't find the file. I'm basically doing what this guy is doing here, which is taking a file path from a...

solr,django-haystack,django-cms
I've been digging around and can't seem to find a way to create a search index for the page description meta tags using Haystack and Solr. Does anyone have experience with this, or any tips? I have looked at the page model in cms, but can't figure out how to...

mysql,solr
I'm using Apache Solr to index documents for a search engine. These documents are stored locally on my file system. In order to do a faceted search I also have to include these documents meta-data which is stored in a MySQL DB. Is there a way to simultaneously index these...

spring,solr,filtering,facet
Is it possible to combine a facet and field query in spring data solr? Something that would build a query like this: > http://localhost:8983/solr/myCore/select?q=lastName%3AHarris*&fq=filterQueryField%3Ared&wt=json&indent=true&facet=true&facet.field=state In other words, how do I add FilterParameters to a SimpleFacetQuery? Any/all replies welcome, thanks in advance, -- Griff...

solr
Now I have a solr collection: question question has some field: id answer_count created_at updated_at now I have the sort rule: score = answer_count * 100 - (the hours now to created_at) * 5 then I need to sort by the score desc. how can i do that because of...

mysql,oracle,solr,dataimporthandler
I have used Solr 3.3 version as Data Import Handler(DIH) with Oracle. Its working fine for me. Now I am trying the same with Mysql. With the change in database, I have changed the query used in data-config.xml for MySql. The query has variables which are passed url in http....

solr,solrj,solr-highlight
I'm beginning with SOLR so please don't flame me if this question is stupid or something like this. I was reading solr documentation and found out that there is something called "highlight". I have really simple query: /select?q=text:test&wt=json&indent=true text is a field in my index and I'm trying to highlight...

solr,solrcloud
I have a SolrCloud with one collection configured with compositeId and numShards=3 and replicationFactor=2. there will be about 200K inserts a day and about as many searches. from the SolrCloud documentation: "If the machine is a replica, the document is forwarded to the leader for processing." Does this means that...

solr
I called splitshard, and now this is what I see even after posting a commit: I thought splitshard was supposed to get rid of the original shard, shard1, in this case. Am I missing something? I was expecting the only two remaining shards to be shard1_0 and shard1_1. The REST...

mysql,solr
I'm trying to store the file path of an locally stored indexed document in Apache Solr so I can then update the index with metadata that is stored in a DB in MySQL. That file path is how I'm going to relate the document to its corresponding metadata I already...

solr,docker,boot2docker
I pulled this SOLR docker image and then followed the instructions to run it. docker run -d -p 8983:8983 -t makuk66/docker-solr Typing in docker ps yielded 1197d246f0e3 makuk66/docker-solr:latest "/bin/bash -c '/opt/ 50 minutes ago Up 50 minutes 0.0.0.0:8983->8983/tcp suspicious_sinoussi So I know it's running. In order to connect to it...

django,apache,solr,django-haystack,solr-multy-valued-fields
I'm experiencing a problem with Apache Solr where I'm receiving fields wrapped in lists in JSON responses but they should be singular. Here is an exerpt from schema.xml, two example fields giving me a problem are django_ct and django_id: <fields> <!-- general --> <field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>...

solr,full-text-search
The Solr docs say: solr.ReversedWildcardFilterFactory A filter that reverses tokens to provide faster leading wildcard and prefix queries. Add this filter to the index analyzer, but not the query analyzer. The standard Solr query parser will use this to reverse wildcard and prefix queries to improve performance... How does it...

mysql,solr
I'm trying to migrate a server with Solr 4.7.2 on it. I have a Solr 4.10.2 with 4 cores running which is the new machine. I have an importer running on the old machine that poses no problem. However, when trying to run the importer on the new machine, I...

solr,schema,unique-key
In Solr 5.1.0, is it possible to set the unique key via the REST schema api? I created a collection with the data driven schema. Solr would guess what the field type and create the field based on the data I upload. I can still define fields beforehand by sending...

solr,multi-tenant
Newbie question so please be nice. :) Basically we need to implement editorial boosting for a multi-tenant SOLR environment wherein a pre-defined query from a user would always bring a certain set of documents at the top of the results. A couple of challenges we have include: Given a single...

indexing,solr,hbase,storm
I am working on designing the Data Indexing feature into Solr. We are using Storm Topology and have a Hbase Bolt where it is adding data into Hbase. The requirement is what ever data we are adding into Hbase, needs to be indexed as well. The following are the options:...

solr,elasticsearch
I'm just reading the book Mastering Apache Solr and the writer recommends to set the minimum heap size (-Xms) to 2GB and the maximum heap size (-Xmx) to 12GB. Is 2GB necessary? I just use a 512MB server (which is low, I know) for Solr and I found it already...