To understand the high-level performance considerations for Impala queries, read the output of the
EXPLAIN statement for the query. You can get the EXPLAIN plan without
actually running the query itself.

For an overview of the physical performance characteristics for a query, issue the SUMMARY
statement in impala-shell immediately after executing a query. This condensed information
shows which phases of execution took the most time, and how the estimates for memory usage and number of rows
at each phase compare to the actual values.

To understand the detailed performance characteristics for a query, issue the PROFILE
statement in impala-shell immediately after executing a query. This low-level information
includes physical details about memory, CPU, I/O, and network usage, and thus is only available after the
query is actually run.

Using the EXPLAIN Plan for Performance Tuning

The EXPLAIN statement gives you an outline
of the logical steps that a query will perform, such as how the work will be distributed among the nodes
and how intermediate results will be combined to produce the final result set. You can see these details
before actually running the query. You can use this information to check that the query will not operate in
some very unexpected or inefficient way.

The last part of the plan shows the low-level details such as the expected amount of data that will be
read, where you can judge the effectiveness of your partitioning strategy and estimate how long it will
take to scan a table based on total data size and the size of the cluster.

As you work your way up, next you see the operations that will be parallelized and performed on each
Impala node.

At the higher levels, you see how data flows when intermediate result sets are combined and transmitted
from one node to another.

See EXPLAIN_LEVEL Query Option for details about the
EXPLAIN_LEVEL query option, which lets you customize how much detail to show in the
EXPLAIN plan depending on whether you are doing high-level or low-level tuning,
dealing with logical or physical aspects of the query.

The EXPLAIN plan is also printed at the beginning of the query profile report described in
Using the Query Profile for Performance Tuning, for convenience in examining both the logical and physical aspects of the
query side-by-side.

The amount of detail displayed in the EXPLAIN output is controlled by the
EXPLAIN_LEVEL query option. You typically
increase this setting from standard to extended (or from 1
to 2) when doublechecking the presence of table and column statistics during performance
tuning, or when estimating query resource usage in conjunction with the resource management features.

Using the SUMMARY Report for Performance Tuning

The SUMMARY command within
the impala-shell interpreter gives you an easy-to-digest overview of the timings for the
different phases of execution for a query. Like the EXPLAIN plan, it is easy to see
potential performance bottlenecks. Like the PROFILE output, it is available after the
query is run and so displays actual timing numbers.

For example, here is a query involving an aggregate function, on a single-node VM. The different stages of
the query and their timings are shown (rolled up for all nodes), along with estimated and actual values
used in planning the query. In this case, the AVG() function is computed for a subset of
data on each node (stage 01) and then the aggregated results from all nodes are combined at the end (stage
03). You can see which stages took the most time, and whether any estimates were substantially different
than the actual data distribution. (When examining the time values, be sure to consider the suffixes such
as us for microseconds and ms for milliseconds, rather than just looking
for the largest numbers.)

Using the Query Profile for Performance Tuning

The PROFILE statement, available in the impala-shell interpreter,
produces a detailed low-level report showing how the most recent query was executed. Unlike the
EXPLAIN plan described in Using the EXPLAIN Plan for Performance Tuning, this information is only available
after the query has finished. It shows physical details such as the number of bytes read, maximum memory
usage, and so on for each node. You can use this information to determine if the query is I/O-bound or
CPU-bound, whether some network condition is imposing a bottleneck, whether a slowdown is affecting some
nodes but not others, and to check that recommended configuration settings such as short-circuit local
reads are in effect.

By default, time values in the profile output reflect the wall-clock time taken by an operation.
For values denoting system time or user time, the measurement unit is reflected in the metric
name, such as ScannerThreadsSysTime or ScannerThreadsUserTime.
For example, a multi-threaded I/O operation might show a small figure for wall-clock time,
while the corresponding system time is larger, representing the sum of the CPU time taken by each thread.
Or a wall-clock time figure might be larger because it counts time spent waiting, while
the corresponding system and user time figures only measure the time while the operation
is actively using CPU cycles.

The EXPLAIN plan is also printed
at the beginning of the query profile report, for convenience in examining both the logical and physical
aspects of the query side-by-side. The
EXPLAIN_LEVEL query option also controls the
verbosity of the EXPLAIN output printed by the PROFILE command.

Here is an example of a query profile, from a relatively straightforward query on a single-node
pseudo-distributed cluster to keep the output relatively brief.