Humans have counted things for a very long time. In database applications, COUNT() is frequently used in various contexts. The COUNT performance affects both application performance and user experience. Keeping this mind, Couchbase supported generalized COUNT() and has improved its performance in Couchbase 4.5 release.

Two Couchbase 4.5 Features

There are two features in Couchbase 4.5 helping the COUNT performance.

When the query is interested only in the COUNT of a range of data that’s indexed, the indexer does the counting itself. In other words, the query pushes the counting to the index. This reduces the amount of data exchanged between indexer and query, improving the query speed.

First, when all of the predicates can be pushed down to index scan and we know index scan won’t result in any false positives, the index can simply do the COUNT() of qualifying values without returning the qualifying data to the query engine. This improves the COUNT performance. When the COUNT() is pushed to the index scan, the optimizer will denote this by choosing the IndexCountScan method in the query plan. The qualifying queries benefit directly, without any effort from the application developer.

The predicate (name = 'Air Alaska') is pushed down to the index scan in addition to the COUNT() aggregate as well.

Second, the optimal execution of the MIN() aggregate when the query is covered and can be pushed down to index.

SELECT MIN(name)
FROM `travel-sample
WHERE prodname > 'Air Alaska';

For this query, the first name that’s greater than “Air Alaska” is determined by the index scan. Because we’re interested in the just one minimum value and therefore, in the query plan, we push the {limit:1} parameter to the index scan. This works very quickly compared to getting all the qualifying values from the index scan to determine the minimum value.

Use Cases

Let’s walk through common use cases and how these features work together to speed up many queries. The examples use travel-sample data set shipped along with Couchbase server.

There are some interesting uses with a little bit of help from the application developer. Since multiple customers are using this approach, it’s worth checking out if this is going to help you as well.

Let’s look at common questions and how to improve the performance with Couchbase 4.5.

To determine the number of distinct items, this plan will scan the whole bucket, which determines the distinct types. Let’s see how we can improve this.

The MIN() optimization in Couchbase 4.5 can be exploited to optimize this. We can evaluate if the field is the leading key of any index, as the index has pre-sorted data. Using this, we can write a small script to calculate the distinct values much faster.

The Query Plan shows that this is done by index scan only. The scan parameter "limit": "1" shows why we get the results so fast. Because the index stores data in a sorted order, the first item will be the lowest value.

Pro Tip

Use a prepared statement for the SELECT statement in the while loop to improve the performance even further.

Use Case 3

Grouping works closely with counting. It’s common to group the data by type, name, and date and count them. If you have a dataset with millions of documents, the index scan itself may take minutes to scan and retrieve all the data.

This is run on travel-sample with about 31,000 documents. The Python script itself runs about 10 SELECT statements to retrieve this result. As the number of documents in the dataset increases, the script approach will run much faster to a single statement. Further, using the prepared statement, you can save more CPU cycles.

Conclusion

The MIN() and COUNT() optimizations in 4.5 improves query performance. Using additional logic, you can use these optimizations to improve the performance of a variety of queries.