Description

At times it may be required to write a process that uses both the Hadoop and Hive environment and API. For example, someone may write an application that uses the HIVE api directly. This patch will add a more generic --jar extension that can start any class with the proper environment.

Looks great!
Can you include TestHIve.java in the patch (and put it in some package org.apache.hadoop.hive.examples, etc), and then invoke TestHive with "ant" in the "test" target (or maybe add a "test_scripts")?

Zheng Shao
added a comment - 14/Jul/09 07:57 Looks great!
Can you include TestHIve.java in the patch (and put it in some package org.apache.hadoop.hive.examples, etc), and then invoke TestHive with "ant" in the "test" target (or maybe add a "test_scripts")?

TestHive is really not intended for inclusion. The target of the jira is the jar.sh script. We really don't test any of the sh scripts directly since they require the hadoop environment to work TestCLIDriver is an emulated environment. We can include TestHive but it is not actually a test of the jar.sh.

Edward Capriolo
added a comment - 28/Jul/09 17:12 TestHive is really not intended for inclusion. The target of the jira is the jar.sh script. We really don't test any of the sh scripts directly since they require the hadoop environment to work TestCLIDriver is an emulated environment. We can include TestHive but it is not actually a test of the jar.sh.

This patch specifies the jarfile and class name as command line arguments like hadoop does. With this change the ordering of jarfile, classname and -hiveconf is now significant. Launching a jar from hadoop has similar constraints so this should not be an issue.

Edward Capriolo
added a comment - 30/Jul/09 19:45 This patch specifies the jarfile and class name as command line arguments like hadoop does. With this change the ordering of jarfile, classname and -hiveconf is now significant. Launching a jar from hadoop has similar constraints so this should not be an issue.

I think we will need a class, say HiveShell, which will read the -hiveconf parameters, remove them from the command line and then invoke the user specified class with the remaining command line arguments. does this make sense?

Raghotham Murthy
added a comment - 30/Jul/09 20:39 I think we will need a class, say HiveShell, which will read the -hiveconf parameters, remove them from the command line and then invoke the user specified class with the remaining command line arguments. does this make sense?

We may not need to add a helper class for the user. The user that would be using this is likely advanced and should/would use the OptionsProcessor and SessionState as they see fit.

A reason I could see for creating HiveShell is we would like to retrofit all the current tools cli, lineage, hwi to fit into some interface ensure that the SessionState, hive history, and etc is started up properly.

My use case is to be able to launch a class that can start a map/reduce program with the hadoop API and then execute a query with the hive API. I am using GregorianCalendar and date processing to figure out what files/partition to operate on, building hive strings and executing them directly with the QueryProcessor.

It seems some people are using a combination of bash|perl|python and hive -e|-f. Other then my reliance on cron to start off these jobs, I am 100% pure Java.

This tiny shell script is my entry point. For me, it does not need more sophistication but I could be missing something.

Edward Capriolo
added a comment - 30/Jul/09 21:16 I follow what you are saying.
We may not need to add a helper class for the user. The user that would be using this is likely advanced and should/would use the OptionsProcessor and SessionState as they see fit.
A reason I could see for creating HiveShell is we would like to retrofit all the current tools cli, lineage, hwi to fit into some interface ensure that the SessionState, hive history, and etc is started up properly.
My use case is to be able to launch a class that can start a map/reduce program with the hadoop API and then execute a query with the hive API. I am using GregorianCalendar and date processing to figure out what files/partition to operate on, building hive strings and executing them directly with the QueryProcessor.
It seems some people are using a combination of bash|perl|python and hive -e|-f. Other then my reliance on cron to start off these jobs, I am 100% pure Java.
This tiny shell script is my entry point. For me, it does not need more sophistication but I could be missing something.

Raghotham Murthy
added a comment - 09/Aug/09 07:12 I am fine with it as well. I think we should create a separate jira for HiveShell though. Eventually, it would be good to move all tools to use the same code path for configuration management.