Facets are annoying. Mostly because they tend to be used relatively rarely, but when they are used, they are used a lot, and in many interesting ways. Because they are fairly rarely used, let me explain what they are first.

We want to show the users all the recent phones. We can do this using the following query:

So far, so good. But we want to do better than just show a potentially big list to the user. So we start faceting the results. We pull some interesting pieces of data from the results, and allow the user to quickly narrow down the details. Typical facets for a phone query would be:

Brand

Screen Size

Camera

Cost

Now, we can certainly give the option to search for them individually, but then you end up with something like this:

That is neither friendly nor very useful. so UX designers came up with this idea, instead:

This gives the user a much easier way to look at the large data sets, and quickly narrow down the search to just the things they want. It also gives the user some indication about the actual data. I can see how changing the facet selection would likely change the number of results I have. This is a great way to offer data exploration capabilities. And in any kind of data repository, this is crucial. Shopping sites are a natural choice, but content/document management systems, job sites, information centers are all commonly making use of this feature.

I’m sorry for the long intro, but I feel that this feature needs to be placed in context, to explain its history and the various iterations it has gone through.

We first got this feature in Sep 2011. It was a pull request from Matt Warren, apparently based on some code that I spiked. I intentionally goes there, because that is the first time we see this code, and it was implemented in the simplest possible manner that could possibly work, to showcase this feature and to make sure that it works end to end. The underlying implementation looked something like this:

Get all the terms for the field (if this is Operating System, then that would be: Android, iOS, BlackBerry, Windows Phone)

For each of those terms, executed the original query and add a clause that limit the results to the current term and value.

As you can imagine, this has some.. performance issues. The more facets we had, and the more terms per facets, the costlier this got. Surprisingly enough, this code lasted almost unmodified for about 10 months, and the next major change was simply to process all of this in parallel.

Shortly thereafter, we have more changes. Moving from this trivial model to a much more efficient one, where we are executing the query once (PR from Michael Weber, by the way) per facet, instead of once per facet/term. Dynamic aggregation in RavenDB is also done through facets, and that added a much more common code path, so more changes started happening to the facets code. In particular, we started handling caching, we ended up reducing the time to run by a lot by restructuring how we were handling things so we would only executed a single query, then use low level index access to get all the other information we needed in an optimized fashion.

We got more customer usage sample, from the guys who had 70,000 facet queries to the people who wanted facets over multiple millions of results. All of that meant that we put a lot of thought into how we can reduce the performance of faceted queries, and over time, things because more and more complex, but much faster. In particular, by ordering the sequence of operations for computing the query, adding prefetching support and in general greatly reduced the amount of work we needed to do to just get the query working.

There is a lot of stuff going on around this code, but this is where we actually computed the facets. Note that this basically calls the lambda once per every matching term for the specified fields for every document in the query.

And the performance was pretty good for our test cases, we run this through several customers who were heavy facet users, and they saw a marked improvement in performance of queries. But we recently got a customer that mentioned in passing that facet queries were expensive. As in, 5 – 10 seconds range per query. That raised some red flags, we consider requests that takes more than 100ms to be suspicious, so we asked for more information, and we realized that he was using facets slightly differently from most of our customers.

Usually, facets are used to… well, facet the data on a search. Typical performance issues with facets in the past were with large number of facets on a search result (I wasn’t kidding with the 70,000 facets). Because most of the customer feedback we got was around that, we focused on this issue primarily, to the point were even stuff like that was well behaving. This customer, however, was using facets not for searching, but as a way to manage the data itself. That meant that they were using a lot of facets on the entire data set.

So instead of having many facets on a medium size number of documents, we were using a small number of facets, but on the entire data set. Profiling this resulted in a few calls being highlighted:

GetOrAdd

TryGetValue

They are very cheap calls, on their own, but this piece of code was called literally tens of millions of times for this particular scenario. We spent some time with a profiler and started playing with the code, trying to see if we can remove the costly calls and optimize this code path.

The answers was… no. In fact, everything we did was in the range of 5% – 7% change (in both directions, mind).

This sucked, because going back to the user and saying: “that is the way it is” didn’t sit well with us. We got together, closed the shutters, turned off the light, and solely by the light of the monitors, we did a major hacking session. We completely changed the way we were handling facets, to avoid not just the problem with the calls to TryGetValue or GetOrAdd but to change direction to a much more efficient mode.

This post is getting pretty long, so I’ll discuss this in my next post. For now, can you suggest options for improvements?