I am trying to create N1QL indexes for a couple of queries our application wants to perform. The issue is, I’m not able to design a single index which can be used efficiently for all of the queries even though I have this feeling I should be able to.

The setup we are using is this:

We are using document ID (meta().id) to encode some application specific type information. For example, a document ID “a:_role/X” would mean that there is a role X in the system. The “a:” part means the document is a reference document and only contains a reference ID for the real document.

Now, we have the following types of queries we want to perform (we have more, but this should be enough to explain the issue):

Find all the roles defined in the system.

Find all references to “c:YYY”.

We have currenly defined one index for this as follows:

CREATE INDEX alias_index on bucket(meta().id, ref) WHERE meta().id LIKE ‘a:%’;

Query for use-case (1):

_SELECT meta().id FROM bucket USE INDEX (alias_index USING GSI) WHERE meta().id LIKE ‘a:%’ AND meta().id LIKE 'a:role%’

Query for use-case (2):

SELECT meta().id FROM bucket USE INDEX (alias_index USING GSI) WHERE meta().id LIKE ‘a:%’ AND ref=‘c:YYY’

Both queries work, but are now as fast as they should be. I am currently testing with a bucket of ~ 200k documents, and both types of queries takes ~ 400-500 ms on my local development machine.

The issue is (I believe) the usage / non-usage of the first index key (meta().id). If I want to use the same index for both queries, meta().id needs to be part of the index definition (I get a ‘No index available on keyspace bucket that matches your query’ otherwise) because of the first query, but query (2) doesn’t really have a value for that key, so the best I can do there is a wildcard (%), which seems non-optimal.

I know I can get really fast indexes for these queries since using two different indexes (with only ref defined for index (2) and only meta().id defined for index (1)) give me timings in the range of ~ 5 ms.

I briefly tried looking at adaptive indexes as well, but don’t really know how to fit meta().id in there.

Would the better way here be to have separate indexes for these queries? I really want to avoid too many indexes since it can lead to a lot of overhead for different queries that are almost identical.

You are totally correct, escaping the ‘_’ char made the primary index query really fast. Many thanks!

Now… what would actually be the benefit of using a primary index here? I don’t really have a need for ad-hoc queries, and as long as I have to create an index, wouldn’t a specialised secondary index almost always be better? Also, as far as I understand primary indexes, they don’t really store any data (to “cover” queries), so as soon as I need to select data that isn’t meta information, it will have to retrieve it using the data service?

As an alternative, say I decide to instead use a secondary index for the query type to “find all roles”… the “_role” key prefix isn’t the only type of key prefix I’m gonna want to query on, as we also have prefixes like “_type” and “_workspace”. In order to not have to create three different secondary indexes, it seems I can create the index like this:

CREATE INDEX alias_index2 ONcmbucket(meta().id) where meta().id like 'a:\\_role%' OR meta().id like 'a:\\_type%' OR meta().id like 'a:\\_workspace%'

Example:

SELECT meta().id FROMcmbucketUSE INDEX (alias_index2 USING GSI) WHERE meta().id LIKE 'a:\\_workspace%' AND meta().id LIKE 'a:_workspace/admin-workspace-%'

Except for the need to to actually OR every type of document when creating the index like this, wouldn’t this really perform the same job as a primary index?

Queries for all three key prefix types seems to be fast with this index (5-10 ms). Now, I’m just trying to understand the benefit of a secondary index versus a primary index. Won’t a primary index grow very large since it encompasses every document?