Monitoring N1QL Queries

Monitoring and profiling N1QL queries, query service engines, and corresponding system resources is very important for smoother operational performance and efficiency of the system.
In fact, often it is vital for diagnosing and troubleshooting issues such as query performance, resource bottlenecks, and overloading of various services.

Couchbase Server 4.5 introduced system keyspaces to provide various monitoring details and statistics about individual queries and query service.
When running on a cluster with multiple query nodes, stats about all queries on all query nodes are collected in these system keyspaces.

For example, this can help identify:

the top 10 slow or fast queries running on a particular query engine or the cluster

resource usage statistics of the query service, or resources used for a particular query

details about the active, completed, and prepared queries

find long running queries that are running for more than 2 minutes

These system keyspaces are like virtual keyspaces that are transient in nature, and are not persisted to disk or permanent storage.
Hence, the information in the keyspaces pertains to the current instantiation of the query service.
You can access the keyspaces using any of the following:

N1QL language (from the cbq shell or Query Workbench)

REST API

Monitoring SDK

All the power of the N1QL query language can be applied on the keyspaces to obtain various insights.

Authentication and Client Privileges

Client applications must be authenticated with sufficient privileges to access system keyspaces.

An Administrator user can access all system keyspaces, and can grant or revoke privileges to and from other users.

Other non-Administrator users must have explicitly granted privilege called Query System Catalog to access the restricted system keyspaces.
For querying details of system keyspaces, see Getting System Information.

The following system keyspaces are considered open, that is, all users can access them without any special privileges:

system:dual

system:datastores

system:namespaces

Query Monitoring Settings

The monitoring settings can be set for each query engine (using the REST API) or for each query statement (using the cbq command line tool).
Both settings are actually set via REST endpoints: using the Admin REST API (/admin/settings endpoint) and the Query REST API (/query/service endpoint).

The cbq shell and Query Workbench use the Query REST API to set monitoring at the request level.
The Query Workbench automatically enables the profiling and timings.
It can be disabled using the Preferences option.
For more information refer to the Query Workbench section.

Use the following query parameters to enable, disable, and control the monitoring capabilities, and the level of monitoring and profiling details for each query or globally at a query engine level:

You can enable profile settings for each query engine.
These examples use local host IP address and default port numbers.
You need to provide correct credentials, IP address, and port details of your setup.

phaseCounts - Count of documents processed at selective phases involved in the query execution, such as authorise, indexscan, fetch, parse, plan, run etc.,

This value will be dynamic, depending on the documents processed by various phases up to this moment in time.

A new query on system:active requests will return different values.

"phaseCounts": {
"fetch": 16,
"indexScan": 187
}

phaseOperators - Indicates the number of each kind of query operators involved in different phases of the query processing.
For instance, this example, one non covering index path was taken, which involves 1 indxeScan and 1 fetch operators.

A join would have probably involved 2 fetches (1 per keyspace)

A union select would have twice as many operator counts (1 per each branch of the union).

This is in essence the count of all the operators in the executionTimings field.

"phaseOperators": {
"authorize": 1,
"fetch": 1,
"indexScan": 2
}

executionTimings - The execution details such as kernel and service execution times, number of documents processed at each query operator in each phase, and number of phase switches, for various phases involved in the query execution.

This is a new virtual attribute in document metadata introduced in Couchbase Server 5.0.

The meta().plan captures the whole query plan, and monitoring stats of various phases and involved query operators.
The meta().plan must be explicitly called in the SELECT query projection list.

The meta().plan attribute is enabled only for individual requests that are running (active_requests) or completed (completed_requests) when the profile is set to timings (profile ="timings") for each individual request.
If at the engine level, the profile is set to off and individual requests have been run with profile ="timings", then the system keyspaces will return the plan only for those requests.

Since there may be a combination of profile settings for all of the requests reported by the system keyspaces, not all requests returned will have a meta().plan attachment.

For the system:prepareds requests, the meta().plan is available at all times since the PREPARE statement is not dependant on the profile setting.

N1QL Cluster Monitoring

Description

Couchbase Server allows you to monitor many aspects of an active cluster, and version 5.0 now has more cluster-aware operations, further diagnostics and more system keyspaces features that cover multiple nodes.
New functionalities specifically include:

Extra counters to keep track of specific requests, such as cancelled requests.

Extend killing request to CREATE INDEX.

Changes to System Keyspaces

System:nodes show the datastore topology information.
This is separate from the N1QL clustering view of the world, in that N1QL clustering shows a map of the N1QL cluster, as provided by the cluster manager, while system:nodes shows a view of the nodes and services that make up the actual datastore, which may or may not include N1QL nodes.

The dichotomy is important in that N1QL could be clustered by one entity (e.g.
Zookeeper) and be connected to a clustered datastore (e.g.
Couchbase) such that each does not have visibility of the other.

Should N1QL be extended to be able to concurrently connect to multiple datastores, each datastore will report its own topology, so that system:nodes offers a complete view of all the storage nodes, whatever those may be.

Ideally, system:nodes will provide a way to report services advertised by each node as well as services that are actually running.

This will be datastore dependent.

N1QL clustering will still be reported by the /admin endpoints.

System:keyspaces can be augmented to list system keyspaces with a static map.
The small disadvantage of this is that it has to be maintained as new system keyspaces are added.

System:prepareds to list min and max execution and service times, as well as average times.

cbq-engine-cbauth

(Introduced in Couchbase Server 5.0)

Cbq-engine-cbauth is a new internal user that ns_server uses to allow Query Workbench clients to query across multiple query nodes.

Since Query Workbench can connect to the same node only when cbq-engine is running, Query Workbench cannot do any query-clustered operations.

To get around this block, once the Query Workbench clients connect to a query node, this internal user (cbq-engine-cbauth) will be used to do any further inter-node user verification.

Vitals

The Vitals API provides data about the running state and health of the query engine, such as number of logical cores, active threads, queued threads,CPU utilization, memory usage, network utilization, garbage collection percentage, and so on.
This information can be very useful to assess the current workload and performance characteristics of a query engine, and hence load-balance the requests being sent to various query engines.

In the above example, note the new plan section.
It contains a tree of operators that combine to execute the N1QL query.
The root operator is a Sequence, which itself has a collection of child operators like Authorize, PrimaryScan, Fetch, and possibly even more Sequences.

Option

Description

clientContextID

The opaque ID or context provided by the client.

elapsedTime

The time taken from when the request was acknowledged by the service to when the request was completed.
It includes the time taken by the service to schedule the request.

errorCount

Total number of errors encountered while executing the query.

executionTime

The duration of the query from when it started executing to when it completed.

node

IP address and port of the query engine node in the Couchbase Cluster.

remoteAddr

IP address and port number of the client application, from where the query is received.

requestId

Unique request ID internally generated for the query.

requestTime

Timestamp when the query is received.

resultCount

Total number of documents returned in the query result.

resultSize

Total number of bytes returned in the query result.

scanConsistency

The value of the query setting Scan Consistency used for the query.

serviceTime

Total amount of calendar time taken to complete the query.

state

The state of the query execution, such as completed, in progress, cancelled.

The DELETE command can be used to terminate an active request, for instance, a non-responding or a long-running query.

DELETE http://localhost:8093/admin/active_requests/request_ID

DELETE FROM system:active_requests [ WHERE expression ]

system:prepareds

This catalog provides data about the known prepared statements and their state in a query engine’s prepared statement cache.
For each prepared statement, this catalog provides information such as name, statement, query plan, last use time, number of uses, and so on.

Create and Execute Prepared Statements

Create a prepared statement with the syntax PREPARE [ name ( FROM | AS ) ] statement.

system:completed_requests

This catalog maintains a list of the most recent completed requests that have run longer than a predefined threshold of time.
For each completed request, this catalog maintains information such as requestId, statement text, prepared name (if prepared statement), request time, service time, and so on.
This information provides a general insight into the health and performance of the query engine and the cluster.
Note that most field names and meanings match exactly those of system:active_requests.

Couchbase Server version 5.0 has an extra feature which affects completed_requests: request profiling in the following ways:

When the feature is turned on, we will store the execution plan with timings to completed requests.

Profiling information is likely to use 100KB+ per entry.

Due to the added overhead of running both profiling and logging, we recommend turning on both of them only when needed.
Running only one of them continuously has no noticeable affect on performance.

Profiling does not carry any extra cost beyond memory for complete_requests, so it’s fine to run it continuously.

DELETE FROM system:completed_requests
requests WHERE requests.time LIKE "2015-09-09%";requests
WHERE requests.Time LIKE "2015-09-09%";

Configuring the system:completed_requests keyspace

You can configure the system:completed_requests keyspace by specifying the parameters as command line options for the cbq-engine.

completed-threshold: Sets the minimum request duration after which requests are added to the system:completed_requests catalog.
The default value is 1000ms.
Specify 0 to log all requests and -1 to not log any requests to the keyspace.

Query 1 - First, we set profile = "timings" and run a long query which takes at least 1000ms (the default value of the completed-threshold query setting) to get registered in thesystem:completed_requests keyspace: