Timestamps determine the order of precedence for operations on the same column value from different queries.

Speculative query execution

Speculative queries are used to preemptively start a second execution of a query against
another node, before the first node has replied or returned an error.

Speculative queries are used to preemptively start a second execution of a query against
another node, before the first node has replied or returned an error. Sometimes a node may be
slow to respond. Queries sent to that node will experience increased latency.

To mitigate this situation, use speculative execution to preemptively start a second
execution of the query against another node, before the first node has replied or returned an
error. If the second node replies faster, that response is returned to the client and the first
execution is cancelled. If the first execution is cancelled, the driver ignores the response, but
the request still interacts with the server.

Figure: Speculative execution with the second node responding first

The first node might reply just after the second execution call was started. In this
case, the second preemptive execution is cancelled. For applications that use speculative
execution, the data from whichever node replies faster is returned to the client.

Figure: Speculative execution with the first node responding first

Configuration

Speculative execution is disabled by default in the DSE drivers. See the
driver-specific links for how to enable and configure speculative execution.

Tuning and practical details

The goal of speculative execution is to improve the overall latency of an application.
However, too many speculative executions increase the load on the cluster. If speculative
executions are used to avoid sending queries to unhealthy nodes, a healthy node should rarely
reach resource limits. Drivers provide a configurable delay threshold at which speculative
executions will be sent. In order to determine an appropriate threshold for your application,
benchmark the healthy platform state (all nodes are up and under a normal load), monitoring the
response latencies. Based on these results, use the latency at a high percentile (p99.9) as the
speculative executions threshold.

Alternatively, when low latency is the highest priority and the cluster can handle
the increased throughput, set the threshold to 0, which effectively always performs speculative
executions.

Most drivers surface metrics for speculative
executions that can be used to observe the speculative executions frequency.

Query idempotence

If a query is not idempotent, the
driver will never schedule speculative executions for the query, because there is no way to
guarantee that only one coordinator will apply the mutation.

Retries

Turning on speculative executions doesn’t change the driver’s retry behavior. Each parallel execution triggers retries
independently. The only impact is that all executions of the same query always share the same
query plan, so each node will be used by no more than one execution.

Figure: Query plan used in speculative execution

Stream ID exhaustion

One effect of speculative executions is that many requests get cancelled, which can
lead to a phenomenon called stream ID exhaustion. Each TCP connection can handle multiple
simultaneous requests, identified by a unique number called a stream ID. See Connection pooling. When a request gets cancelled, the stream ID for that
request cannot be used immediately because the response for that request may be returned later
from the server once it fulfills the request. If this happens often, the number of available
stream IDs diminishes over time. When the available stream IDs goes below a given threshold, the
connection is closed and a new connection is created. If requests are often cancelled,
connections will be recycled at a high rate.

In practice, exhausting all stream IDs on a connection should not occur. Each
connection can reference 32768 stream IDs. Most driver implementations are configured by default
to only send 1000 requests per connection.

Most drivers provide a metric for observing the number of inflight requests for a
node. Some drivers provide a metric for observing the number of orphaned stream IDs. Monitor
these metrics to ensure that stream ID exhaustion is not occurring.

If stream ID exhaustion is occurring, the typical solution is to add more capacity
to the cluster, or adjust the system settings of the nodes to avoid resource limits.

Request ordering

Ordering issues are only a problem when using server-side timestamps. All recent
versions of the DataStax drivers use client-side timestamps with exception of the C++, PHP, and
Ruby drivers. Unless the driver is explicitly configured to use server-side timestamps, this
section does not apply. See Query timestamps for details.

For example, the following query is run with speculative execution and server-side
timestamps enabled.

INSERT INTO my_table (k, v) VALUES (1, 1);

When the first execution is slow, a second execution is triggered. Finally, the
first execution completes, so the second execution is cancelled. However, cancelling an
execution only means that the driver stops waiting for the server’s response. The request could
still be active.

Suppose that while the second request is still active after the driver canceled the
execution, the following query is run and completes successfully.

DELETE from my_table where k = 1;

The second request of the INSERT query finally reaches its target
node, which applies the INSERT. The row that was successfully deleted is back,
despite the driver canceling the second request.