Beeline – New Command Line Shell

HiveServer2 supports a new command shell Beeline that works with HiveServer2. It's a JDBC client that is based on the SQLLine CLI (http://sqlline.sourceforge.net/). There’s detailed documentation of SQLLine which is applicable to Beeline as well.

The Beeline shell works in both embedded mode as well as remote mode. In the embedded mode, it runs an embedded Hive (similar to Hive CLI) whereas remote mode is for connecting to a separate HiveServer2 process over Thrift. Starting in Hive 0.14, when Beeline is used with HiveServer2, it also prints the log messages from HiveServer2 for queries it executes to STDERR.

Separated-Value Output Formats

Starting with Hive 0.14, there are improved SV output formats available, namely DSV, CSV2 and TSV2. These conform better to standard CSV convention, which adds quotes around a cell value only if it contains special characters (such as the delimiter character or a quote character) or spans multiple lines. These three formats differ only with the delimiter between cells, which is comma for CSV2, tab for TSV2, and configurable for DSV (delimiterForDSV property).

CSV and TSV output formats are maintained for backward compatibility, but beware as they add additional single-quote characters around all cell values contrary to this convention.

JDBC

HiveServer2 has a new JDBC driver. It supports both embedded and remote access to HiveServer2.

Connection URLs

Connection URL for Remote or Embedded Mode

The JDBC connection URL format has the prefix jdbc:hive2:// and the Driver class is org.apache.hive.jdbc.HiveDriver. Note that this is different from the old HiveServer.

For a remote server, the URL format is jdbc:hive2://<host>:<port>/<db> (default port for HiveServer2 is 10000).

For an embedded server, the URL format is jdbc:hive2:// (no host or port).

Running the JDBC Sample Code

# Then on the command-line
$ javac HiveJdbcClient.java
# To run the program using remote hiveserver in non-kerberos mode, we need the following jars in the classpath
# from hive/build/dist/lib
# hive-jdbc*.jar
# hive-service*.jar
# libfb303-0.9.0.jar
# libthrift-0.9.0.jar
# log4j-1.2.16.jar
# slf4j-api-1.6.1.jar
# slf4j-log4j12-1.6.1.jar
# commons-logging-1.0.4.jar
#
#
# To run the program using kerberos secure mode, we need the following jars in the classpath
# hive-exec*.jar
# commons-configuration-1.6.jar
# and from hadoop
# hadoop-core*.jar
#
# To run the program in embedded mode, we need the following additional jars in the classpath
# from hive/build/dist/lib
# hive-exec*.jar
# hive-metastore*.jar
# antlr-runtime-3.0.1.jar
# derby.jar
# jdo2-api-2.1.jar
# jpox-core-1.2.2.jar
# jpox-rdbms-1.2.2.jar
# and from hadoop/build
# hadoop-core*.jar
# as well as hive/build/dist/conf, any HIVE_AUX_JARS_PATH set,
# and hadoop jars necessary to run MR jobs (eg lzo codec)
$ java -cp $CLASSPATH HiveJdbcClient

Alternatively, you can run the following bash script, which will seed the data file and build your classpath before invoking the client. The script adds all the additional jars needed for using HiveServer2 in embedded mode as well.

JDBC Client Setup for a Secure Cluster

The client needs to have a valid Kerberos ticket in the ticket cache before connecting.

NOTE: If you don't have a "/" after the port number, the jdbc driver does not parse the hostname and ends up running HS2 in embedded mode . So if you are specifying a hostname, make sure you have a "/" or "/<dbname>" after the port number.

In the case of LDAP, CUSTOM or PAM authentication, the client needs to pass a valid user name and password to the JDBC connection API.

To use sasl.qop, add the following to the sessionconf part of your Hive JDBC hive connection string, e.g.

Multi-User Scenarios and Programmatic Login to Kerberos KDC

In the current approach of using Kerberos you need to have a valid Kerberos ticket in the ticket cache before connecting. This entails a static login (using kinit, key tab or ticketcache) and the restriction of one Kerberos user per client. These restrictions limit the usage in middleware systems and other multi-user scenarios, and in scenarios where the client wants to login programmatically to Kerberos KDC.

One way to mitigate the problem of multi-user scenarios is with secure proxy users (see HIVE-5155). Starting in Hive 0.13.0, support for secure proxy users has two components:

Delegation token based connection for Oozie (OOZIE-1457). This is the common mechanism for Hadoop ecosystem components.

Direct proxy access for privileged Hadoop users (HIVE-5155). This enables a privileged user to directly specify an alternate session user during the connection. If the connecting user has Hadoop level privilege to impersonate the requested userid, then HiveServer2 will run the session as that requested user.

The other way is to use a pre-authenticated Kerberos Subject (see HIVE-6486). In this method, starting with Hive 0.13.0 the Hive JDBC client can use a pre-authenticated subject to authenticate to HiveServer2. This enables a middleware system to run queries as the user running the client.

Using Kerberos with a Pre-Authenticated Subject

To use a pre-authenticated subject you will need the following changes.

Add hive-exec*.jar to the classpath in addition to the regular Hive JDBC jars (commons-configuration-1.6.jar and hadoop-core*.jar are not required).

Add auth=kerberos and kerberosAuthType=fromSubject JDBC URL properties in addition to having the “principal" url property.

Open the connection in Subject.doAs().

The following code snippet illustrates the usage (refer to HIVE-6486 for a complete test case):

Modify the example URL as needed to point to your HiveServer2 instance.

Enter 'User Name' and 'Password' and click 'OK' to save the connection alias.

To connect to HiveServer2, double-click the Hive alias and click 'Connect'.

When the connection is established you will see errors in the log console and might get a warning that the driver is not JDBC 3.0 compatible. These alerts are due to yet-to-be-implemented parts of the JDBC metadata API and can safely be ignored. To test the connection enter SHOW TABLES in the console and click the run icon.

Also note that when a query is running, support for the 'Cancel' button is not yet available.