What is Included in an Impala Installation

Impala is made up of a set of components that can be installed on multiple nodes throughout your cluster.
The key installation step for performance is to install the impalad daemon (which does
most of the query processing work) on all DataNodes in the cluster.

The Impala package installs these binaries:

impalad - The Impala daemon. Plans and executes queries against HDFS, HBase, and Amazon S3 data.
Run one impalad process on each node in the cluster
that has a DataNode.

statestored - Name service that tracks location and status of all
impalad instances in the cluster. Run one
instance of this daemon on a node in your cluster. Most production deployments run this daemon
on the namenode.

catalogd - Metadata coordination service that broadcasts changes from Impala DDL and
DML statements to all affected Impala nodes, so that new tables, newly loaded data, and so on are
immediately visible to queries submitted through any Impala node.
(Prior to Impala 1.2, you had to run the REFRESH or INVALIDATE
METADATA statement on each node to synchronize changed metadata. Now those statements are only
required if you perform the DDL or DML through an external mechanism such as Hive or by uploading
data to the Amazon S3 filesystem.)
Run one instance of this daemon on a node in your cluster,
preferably on the same host as the statestored daemon.

impala-shell - Command-line
interface for issuing queries to the Impala daemon. You install this on one or more hosts
anywhere on your network, not necessarily DataNodes or even within the same cluster as Impala. It can
connect remotely to any instance of the Impala daemon.

Before doing the installation, ensure that you have all necessary prerequisites. See
Impala Requirements for details.