Life of a Cloud Spanner Query

Client

The construct @firstName is a reference to a query parameter. You can use a
query parameter anywhere a literal value can be used. Using parameters in
programmatic APIs is strongly recommended. Use of query parameters helps avoid
SQL injection attacks and the resulting queries are more likely
to benefit from various server-side caches. See Caching, below.

Query parameters must be bound to a value when the query is executed. For
example:

Once Cloud Spanner receives an API call, it analyzes the query and bound
parameters to determine which Cloud Spanner server node should process the
query. The server sends back a stream of result rows that are consumed by the
calls to ResultSet.next().

Query execution

Query execution begins with the arrival of an "execute query" request at some
Cloud Spanner server. The server performs the following steps:

Validate the request

Parse the query text

Generate an initial query algebra

Generate an optimized query algebra

Generate an executable query plan

Execute the plan (check permissions, read data, encode results, etc.)

Parsing

The SQL parser analyzes the query text and converts it to an abstract syntax
tree. It extracts the basic query structure (SELECT …
FROM … WHERE …) and does syntactic checks.

Algebra

Cloud Spanner's type system can represent scalars, arrays,
structures, etc. The query algebra defines operators for table scans, filtering,
sorting/grouping, all sorts of joins, aggregation, and much more. The initial
query algebra is built from the output of the parser. Field name references in
the parse tree are resolved using the database schema. This code also checks for
semantic errors (e.g., incorrect number of parameters, type mismatches, and so
forth).

The next step ("query optimization") takes the initial algebra and generates a
more-optimal algebra. This might be simpler, more efficient, or just more-suited
to the capabilities of the execution engine. For example, the initial algebra
might specify just a "join" while the optimized algebra specifies a "hash join".

Execution

The final executable query plan is built from the rewritten algebra. Basically,
the executable plan is a directed acyclic graph of
"iterators". Each iterator exposes a sequence of values. Iterators may consume
inputs to produce outputs (e.g., sort iterator). Queries that involve a single
split can be executed by a single server (the one that holds the data).
The server will scan ranges from various tables, execute joins, perform
aggregation, and all other operations defined by the query algebra.

Queries that involve multiple splits will be factored into multiple pieces. Some
part of the query will continue to be executed on the main (root) server. Other
partial subqueries are handed-off to leaf nodes (those that own the splits being
read). This hand-off can be recursively applied for complex queries, resulting
in a tree of server executions. All servers agree on a timestamp so that the
query results are a consistent snapshot of the data. Each leaf server sends back
a stream of partial results. For queries involving aggregation, these could be
partially-aggregated results. The query root server processes results from the
leaf servers and runs the remainder of the query plan. You can find much more
information here.

Caching

Many of the artifacts of query processing are automatically cached and re-used
for subsequent queries. This includes query algebras, executable query plans,
etc. The caching is based on the query text, names and types of bound
parameters, and so on. This is why using bound parameters (like @firstName in
the example above) is better than using literal values in the query text. The
former can be cached once and reused regardless of the actual bound value. See
Optimizing Cloud Spanner Query Performance for more details.

Error handling

The stream of result rows from the executeQuery method can be interrupted for
any number of reasons: transient network errors, handoff of a split
from one server to another (e.g., load balancing), server restarts (e.g.,
upgrading to a new version), etc. To help recover from these errors,
Cloud Spanner sends opaque "resume tokens" along with batches of partial
result data. These resume tokens can be used when retrying the query to continue
where the interrupted query left off. If you are using the Cloud Spanner
client libraries, this is done automatically; thus, users of the client library
do not need to worry about this type of transient failure.