Making Multiple Passes over the Same Data
Hive has a special syntax for producing multiple aggregations from a single pass
through a source of data, rather than rescanning it for each aggregation. This change
can save considerable processing time for large input data sets. We discussed the details
previously in Chapter 5.
For example, each of the following two queries creates a table from the same source
table, history:
hive> INSERT OVERWRITE TABLE sales
> SELECT * FROM history WHERE action='purchased';
hive> INSERT OVERWRITE TABLE credits
> SELECT * FROM history WHERE action='returned';
This syntax is correct, but inefficient. The following rewrite achieves the same thing,
but using a single pass through the source history table:
hive> FROM history
> INSERT OVERWRITE sales SELECT * WHERE action='purchased'
> INSERT OVERWRITE credits SELECT * WHERE action='returned';

4、当前HIVE 不支持 not in 中包含查询子句的语法，形如如下的HQ语句是不被支持的: 查询在key字段在a表中，但不在b表中的数据select a.key from a where key not in(select key from b) 该语句在hive中不支持可以通过left outer join进行查询,（假设B表中包含另外的一个字段 key1 select a.key from a left outer join b on a.key=b.key where b.key1 is null