Troubleshooting Impala

Troubleshooting for Impala requires being able to diagnose and debug problems
with performance, network connectivity, out-of-memory conditions, disk space usage,
and crash or hang conditions in any of the Impala-related daemons.

The following sections describe the general troubleshooting procedures to diagnose
different kinds of problems:

If a query fails against Impala but not Hive, it is likely that there is a problem with your Impala
installation.

Troubleshooting I/O Capacity Problems

Impala queries are typically I/O-intensive. If there is an I/O problem with storage devices,
or with HDFS itself, Impala queries could show slow response times with no obvious cause
on the Impala side. Slow I/O on even a single DataNode could result in an overall slowdown, because
queries involving clauses such as ORDER BY, GROUP BY, or JOIN
do not start returning results until all DataNodes have finished their work.

To test whether the Linux I/O system itself is performing as expected, run Linux commands like
the following on each DataNode:

On modern hardware, a throughput rate of less than 100 MB/s typically indicates
a performance issue with the storage device. Correct the hardware problem before
continuing with Impala tuning or benchmarking.

Impala Troubleshooting Quick Reference

The following table lists common problems and potential solutions.

Symptom

Explanation

Recommendation

Impala takes a long time to start.

Impala instances with large numbers of tables, partitions, or data files take longer to start
because the metadata for these objects is broadcast to all impalad nodes and
cached.

Adjust timeout and synchronicity settings.

Joins fail to complete.

There may be insufficient memory. During a join, data from the second, third, and so on sets to
be joined is loaded into memory. If Impala chooses an inefficient join order or join mechanism,
the query could exceed the total memory available.

Start by gathering statistics with the COMPUTE STATS statement for each table
involved in the join. Consider specifying the [SHUFFLE] hint so that data from
the joined tables is split up between nodes rather than broadcast to each node. If tuning at the
SQL level is not sufficient, add more memory to your system or join smaller data sets.

Queries return incorrect results.

Impala metadata may be outdated after changes are performed in Hive.

Where possible, use the appropriate Impala statement (INSERT, LOAD
DATA, CREATE TABLE, ALTER TABLE, COMPUTE
STATS, and so on) rather than switching back and forth between Impala and Hive. Impala
automatically broadcasts the results of DDL and DML operations to all Impala nodes in the
cluster, but does not automatically recognize when such changes are made through Hive. After
inserting data, adding a partition, or other operation in Hive, refresh the metadata for the
table as described in REFRESH Statement.

Queries are slow to return results.

Some impalad instances may not have started. Using a browser, connect to the
host running the Impala state store. Connect using an address of the form
http://hostname:port/metrics.

Note:
Replace hostname and port with the hostname and port of
your Impala state store host machine and web server port. The default port is 25010.

The number of impalad instances listed should match the expected number of
impalad instances installed in the cluster. There should also be one
impalad instance installed on each DataNode

Ensure Impala is installed on all DataNodes. Start any impalad instances that
are not running.

Queries are slow to return results.

Impala may not be configured to use native checksumming. Native checksumming uses
machine-specific instructions to compute checksums over HDFS data very quickly. Review Impala
logs. If you find instances of "INFO util.NativeCodeLoader: Loaded the
native-hadoop" messages, native checksumming is not enabled.

Attempts to complete Impala tasks such as executing INSERT-SELECT actions fail. The Impala logs
include notes that files could not be opened due to permission denied.

This can be the result of permissions issues. For example, you could use the Hive shell as the
hive user to create a table. After creating this table, you could attempt to complete some
action, such as an INSERT-SELECT on the table. Because the table was created using one user and
the INSERT-SELECT is attempted by another, this action may fail due to permissions issues.

In general, ensure the Impala user has sufficient permissions. In the preceding example, ensure
the Impala user has sufficient permissions to the table that the Hive user created.

Impala fails to start up, with the impalad logs referring to errors connecting
to the statestore service and attempts to re-register.

A large number of databases, tables, partitions, and so on can require metadata synchronization,
particularly on startup, that takes longer than the default timeout for the statestore service.

Impala Web User Interface for Debugging

Each of the Impala daemons (impalad, statestored,
and catalogd) includes a built-in web server that displays
diagnostic and status information:

The impalad web UI (default port: 25000) includes
information about configuration settings, running and completed queries, and associated performance and
resource usage for queries. In particular, the Details link for each query displays
alternative views of the query including a graphical representation of the plan, and the
output of the EXPLAIN, SUMMARY, and PROFILE
statements from impala-shell.
Each host that runs the impalad daemon has
its own instance of the web UI, with details about those queries for which that
host served as the coordinator. The impalad web UI is mainly
for diagnosing query problems that can be traced to a particular node.

The statestored web UI (default port: 25010) includes
information about memory usage, configuration settings, and ongoing health checks
performed by this daemon. Because there is only a single instance of this
daemon within any cluster, you view the web UI only on the particular host
that serves as the Impala Statestore.

The catalogd web UI (default port: 25020) includes
information about the databases, tables, and other objects managed by Impala,
in addition to the resource usage and configuration settings of the daemon itself.
The catalog information is represented as the underlying Thrift data structures.
Because there is only a single instance of this daemon within any cluster, you view the
web UI only on the particular host that serves as the Impala Catalog Server.