There are two ways to use Impala to query tables in Hive. One way is to use command line, Impala Shell. Another one is to use Hue GUI. I am going to show both methods one by one.

Use Impala ShellImpala Shell is a nice tool similar to SQL Plus to setup database and tables and issue queries. The speed of ad hoc queries is much faster than Hive’s query, especially for queries requiring fast response time. Here are the steps in using Impala shell.

Note: The prompt shows Not connected. I need to connect the Impala shell to any Data Node with impalad daemon. My cluster is using vmhost2 and vmhost3 as Data Node. So I pick any one of them, use vmhost2 for this test.
[Not connected] > connect vmhost2;
Connected to vmhost2:21000
Server version: impalad version 2.2.0-cdh5.4.3 RELEASE (build 517bb0f71cd604a00369254ac6d88394df83e0f6)
[vmhost2:21000] >

2. Run some queries. Impala can see the same list of databases and tables like Hive does.

Impala
Wow, 41 seconds to get a row count of 22,782 by using Hive. That seem excessive on a cluster no other jobs running. Ok, let’s look at Impala’s result.Note: Impala does not poll frequently for metadata changes. So in case you don’t see the table name after the import, just do the following:invalidate metadata;
show tables;

The above result shows Hive took 41 seconds to get the row count of a table with 22, 782 rows while Impala was significant faster and took 0.12 seconds. I know my cluster is small, not powerful and hive is using Map/Reduce. But getting a total row count of 22,000 needs 45 seconds, it seems too much. On the other hand, Impala’s timing looks more reasonable to me. Obviously Map/Reduce is not my option if I want to run some queries that expect fast response time. But if executing a long running job against a huge dataset, Map/Reduce option might still be on the table if considering job fault tolerance.

There are many UI or command-line tool to access Hive data on Hadoop and I am not going to list them one by one. I use SQL Developer a lot in accessing Oracle database and like this powerful tool. That makes me wondering whether I can use SQL Developer to access Hive table on HDFS. After some researches, I did find a way to configure SQL Developer to access Hive table. Here are the steps:

3. Create a Hive Connection
Create a new connection. The port used is HiveSever2 port or the value for hive.server2.thrift.port. In my cluster, it is 10000. I still use test_oracle hive database created in my previous steps. Click Test, it should show Success.

4. Access Hive Tables
After click Connect, I can connect to my Hive databases on my Hadoop cluster.
You can see the view and look to access Hive is very similar the way accessing regular oracle table. Awesome! Here are a few more other screens.Properties TabDDL Tab
The DDL tab is nice one. The drawback for DDL is that everything is in one line. It would be nice to format the DDL string in a more readable way. Anyway, it is not a big deal.Detail TabModel Tab

Let me run a query and here is the result, exact the look and feel like I use Oracle.

I like good tools. CELLCLI is a useful tools to perform administration works on Exadata Storage Servers. Now, starting Exadata Storage Server Release 12.1.2.1.0, there is another utility, called DBMCLI to configure and monitor Exadata Database Servers. DBMCLI replaces the /opt/oracle.cellos/compmon/exadata_mon_hw_asr.pl Perl script. The usage of DBMCLI is similar to the usage of CELLCLI.

With DBMCLI, we can start/stop services, list alert history, configure SMTP, and monitor hardware components. Here are a few examples executing on our X3 box in the lab. I removed some similar content from the execution result to save space here.

DBMCLI> list alerthistory detail
name: 5_1
alertDescription: "A power supply component suspected of causing a fault"
alertMessage: "A power supply component is suspected of causing a fault with a 100 certainty. Component Name : /SYS/PS1 Fault class : fault.chassis.power.ext-fail Fault message : http://www.sun.com/msg/SPX86-8003-73"
alertSequenceID: 5
alertShortName: Hardware
alertType: Stateful
beginTime: 2015-06-16T12:39:32-05:00
endTime: 2015-07-15T15:59:20-05:00
examinedBy:
metricObjectName: /SYS/PS1_FAULT
notificationState: 0
sequenceBeginTime: 2015-06-16T12:39:32-05:00
severity: critical
alertAction: "For additional information, please refer to http://www.sun.com/msg/SPX86-8003-73"
name: 5_2
alertDescription: "A power supply component fault cleared"
alertMessage: "A power supply component fault has been cleared. Component Name : /SYS/PS1 Trap Additional Info : fault.chassis.power.ext-fail"
alertSequenceID: 5
alertShortName: Hardware
alertType: Stateful
beginTime: 2015-07-15T15:59:20-05:00
endTime: 2015-07-15T15:59:20-05:00
examinedBy:
metricObjectName: /SYS/PS1_FAULT
notificationState: 0
sequenceBeginTime: 2015-06-16T12:39:32-05:00
severity: clear
alertAction: Informational.
name: 6_1
alertDescription: "File system "/" is 80% full"
alertMessage: "File system "/" is 80% full, which is above the 80% threshold. Accelerated space reclamation has started. This alert will be cleared when file system "/" becomes less than 75% full. Top three directories ordered by total space usage are as follows: /opt : 7.79G /home : 7.38G /usr : 3.06G"
alertSequenceID: 6
alertShortName: Software
alertType: Stateful
beginTime: 2015-06-25T19:36:44-05:00
examinedBy:
metricObjectName: /
notificationState: 0
sequenceBeginTime: 2015-06-25T19:36:44-05:00
severity: critical
alertAction: "MS includes a file deletion policy that is triggered when file system utilitization is high. Deletion of files is triggered when file utilization reaches 80%. For the / file system, 1) files in metric history directory will be deleted using a policy based on the file modification time stamp. Files older than the number of days set by the metricHistoryDays attribute value will be deleted first, then successive deletions will occur for earlier files, down to files with modification time stamps older than or equal to 10 minutes, or until file system utilization is less than 75%. 2) files in the ADR base directory and LOG_HOME directory will be deleted using a policy based on the file modification time stamp. Files older than the number of days set by the diagHistoryDays attribute value will be deleted first, then successive deletions will occur for earlier files, down to files with modification time stamps older than or equal to 10 minutes, or until file system utilization is less than 75%. The renamed alert.log files and ms-odl generation files that are over 5 MB, and older than the successively-shorter age intervals are also deleted. Crash files that are over 5 MB and older than one day will be deleted.Try to delete more recent files, or files not being automatically purged, to free up space if needed."
. . . .

DBMCLI> list alerthistory where severity=’critical’
5_1 2015-06-16T12:39:32-05:00 critical “A power supply component is suspected of causing a fault with a 100 certainty. Component Name : /SYS/PS1 Fault class : fault.chassis.power.ext-fail Fault message : http://www.sun.com/msg/SPX86-8003-73&#8221;
6_1 2015-06-25T19:36:44-05:00 critical “File system “/” is 80% full, which is above the 80% threshold. Accelerated space reclamation has started. This alert will be cleared when file system “/” becomes less than 75% full. Top three directories ordered by total space usage are as follows: /opt : 7.79G /home : 7.38G /usr : 3.06G”
8_1 2015-07-15T15:59:35-05:00 critical “A power supply component is suspected of causing a fault with a 100 certainty. Component Name : /SYS/PS1 Fault class : fault.chassis.power.ext-fail Fault message : http://www.sun.com/msg/SPX86-8003-73&#8221;
9_1 2015-08-04T22:04:16-05:00 critical “File system “/u01” is 80% full, which is above the 80% threshold. This alert will be cleared when file system “/u01″ becomes less than 75% full. Top three directories ordered by total space usage are as follows: /u01/app : 147.43G /u01/lost+found : 16K /u01/stage : 4K”

DBMCLI> list alerthistory where severity=’critical’ and beginTime > ‘2015-08-01T01:00:01-07:00’
9_1 2015-08-04T22:04:16-05:00 critical “File system “/u01” is 80% full, which is above the 80% threshold. This alert will be cleared when file system “/u01″ becomes less than 75% full. Top three directories ordered by total space usage are as follows: /u01/app : 147.43G /u01/lost+found : 16K /u01/stage : 4K”

Similarly we can check out other metric history likeDS_MEMUT: The percentage of total physical memory used on the server.DS_SWAP_IN_BY_SEC: The number of swap pages read in KB per second.DS_SWAP_OUT_BY_SEC: The number of swap pages written in KB per second.DS_SWAP_USAGE: The percentage of swap space used.N_HCA_MB_RCV_SEC: The number of MB received by the InfiniBand interfaces per second.N_HCA_MB_TRANS_SEC: The number of MB transmitted by the InfiniBand interfaces per second.N_NIC_KB_RCV_SEC: The number of KB received by the Ethernet interfaces per second.N_NIC_KB_TRANS_SEC: The number of KB transmitted by the Ethernet interfaces per second.

Another useful feature is that DBMCLI can configure email notifications for database server. I did not perform the following steps, just use the example from Oracle document.

This post will discuss a different operation, exporting data from Hive table to Oracle database, just like goldfish jumps from fish bowl to the laptop. Again, we still use Sqoop to perform this operation.

The following example will export the data from my_all_objects_sqoop table in Hive to WZHOU.TEST_IMPORT_FROM_SCOOP table in DBM database on X3.

Note: –export-dir argument specifies the location of hive source directory, not individual filename. –fields-terminated-by ‘\001’ must be included in the command. Otherwise, you will see the following error:
Caused by: java.lang.RuntimeException: Can’t parse input data: ‘SYSTEMLOGMNR_REFCON$7604TABLE2013-03-12’