CDH

This is the documentation for CDH 5.0.x.
Documentation for other versions is available at Cloudera Documentation.

Cloudera Impala User Guide

Cloudera Impala provides high-performance, low-latency SQL queries on data
stored in popular Apache Hadoop file formats.
The fast response for queries enables interactive exploration and
fine-tuning of analytic queries, rather than long batch jobs
traditionally associated with SQL-on-Hadoop technologies.
(You will often see the term "interactive" applied to these kinds of fast queries
with human-scale response times.)

Impala integrates with the Apache Hive metastore database, to share databases and tables
between both components. The high level of integration with Hive,
and compatibility with the HiveQL syntax,
lets you use either Impala or Hive to create tables, issue queries, load data, and so on.

The following are some of the key advantages of Impala:

Impala integrates with the existing CDH ecosystem, meaning data can be
stored, shared, and accessed using the various solutions included with CDH. This also avoids
data silos and minimizes expensive data movement.

Impala provides access to data stored in CDH without requiring the Java
skills required for MapReduce jobs. Impala can access data directly from the HDFS file system.
Impala also provides a SQL front-end to access data in the HBase database system.

Impala returns results typically within seconds or a few minutes, rather than the
many minutes or hours that are often required for Hive queries to complete.

Impala is pioneering the use of the Parquet file format, a columnar storage layout that is optimized for
large-scale queries typical in data warehouse scenarios.