Managing User-Defined Functions (UDFs) with HiveServer2

Hive's query language (HiveQL) can be extended with Java-based user-defined functions (UDFs). See the Apache Hive Language Manual UDF page for information about Hive built-in UDFs. To create customized UDFs, see
the Apache Hive wiki. After creating a new Java class to extend
the com.example.hive.udf package, you must compile your code into a Java archive file (JAR), and add it to the Hive classpath with the ADD
JAR command. The ADD JAR command does not work with HiveServer2 and the Beeline client when Beeline runs on a different host. As an
alternative to ADD JAR, Hive's auxiliary paths functionality should be used.

Perform one of the following procedures depending on whether you want to create permanent or temporary functions.

User-Defined Functions (UDFs) with HiveServer2 Using Cloudera Manager

Creating Permanent Functions

Copy the JAR file to HDFS and make sure the hive user can access this JAR file.

Copy the JAR file to the host on which HiveServer2 is running. Save the JARs to any directory you choose, give the hive user read, write, and execute
access to this directory, and make a note of the path (for example, /opt/local/hive/lib/).
Note: If the Hive Metastore is running on a different host, create the same directory there that you created on the HiveServer2 host. You do not
need to copy the JAR file onto the Hive Metastore host, but the same directory must be there. For example, if you copied the JAR file to /opt/local/hive/lib/ on the
HiveServer2 host, you must create the same directory on the Hive Metastore host. If the same directory is not present on the Hive Metastore host, Hive Metastore service will not start.

In the Cloudera Manager Admin Console, go to the Hive service.

Click the Configuration tab.

Expand the Hive (Service-Wide) scope.

Click the Advanced category.

Configure the Hive Auxiliary JARs Directory property with the HiveServer2 host path and the Hive Metastore host path from Step 2, for example /opt/local/hive/lib/. Setting this property overwrites hive.aux.jars.path, even if this variable has been previously set in the HiveServer2 advanced
configuration snippet.

From the Actions menu at the top right of the service page, select Deploy Client Configuration.

Click Deploy Client Configuration.

Restart the Hive service.

With Sentry enabled - Grant privileges on the JAR files to the roles that require access.
Log in to Beeline as user hive and use the Hive SQL GRANT statement to do
so. For example:

GRANT ALL ON URI 'file:///opt/local/hive/lib/my.jar' TO ROLE EXAMPLE_ROLE

Run the CREATE FUNCTION command to create the UDF from the JAR file.

With Sentry enabled - On a Sentry secured cluster, the USING JAR command is not supported. To load
the jar, you have to make sure the JAR file is at the location pointed to by either hive.aux.jars.path or hive.reloadable.aux.jars.path.

CREATE FUNCTION addfunc AS 'com.example.hiveserver2.udf.add';

Without Sentry - Run the CREATE FUNCTION command as follows and point to the JAR file location in HDFS. For example:

Creating Temporary Functions

Copy the JAR file to the host on which HiveServer2 is running. Save the JARs to any directory you choose, give the hive user read, write, and execute
access to this directory, and make a note of the path (for example, /opt/local/hive/lib/).
Note: If the Hive Metastore is running on a different host, create the same directory there that you created on the HiveServer2 host. You do not
need to copy the JAR file onto the Hive Metastore host, but the same directory must be there. For example, if you copied the JAR file to /opt/local/hive/lib/ on the
HiveServer2 host, you must create the same directory on the Hive Metastore host. If the same directory is not present on the Hive Metastore host, Hive Metastore service will not start.

In the Cloudera Manager Admin Console, go to the Hive service.

Click the Configuration tab.

Expand the Hive (Service-Wide) scope.

Click the Advanced category.

Configure the Hive Auxiliary JARs Directory property with the HiveServer2 host path and the Hive Metastore host path from Step 1, for example /opt/local/hive/lib/. Setting this property overwrites hive.aux.jars.path, even if this variable has been previously set in the HiveServer2 advanced
configuration snippet.

With Sentry enabled - On a Sentry secured cluster, the USING JAR command is not supported. To load
the jar, you have to make sure the JAR file is at the location pointed to by either hive.aux.jars.path or hive.reloadable.aux.jars.path.

CREATE FUNCTION addfunc AS 'com.example.hiveserver2.udf.add';

Without Sentry - Run the CREATE FUNCTION command as follows and point to the JAR file location in HDFS. For example:

Update Existing HiveServer2 User-Defined Functions (UDFs)

Consider a sample UDF, hive_udf.jar that is already in use. If you want to update the sample function, my_udf, in the
hive_udf.jar JAR file, you first drop my_udf using the DROP FUNCTION command. Then, delete the old JAR
file and place the new JAR with the updated function in both, HDFS and the local filesystem. Finally, re-create the my_udf function defined in the new JAR using the
CREATE FUNCTION command. These steps have been described in detail below:

Drop the function to be updated, in this case, my_udf. This can be done by using bash scripts to first, connect to HiveServer2 using Beeline, and then,
drop the function. First create an .hql file with the DROP FUNCTION command:

# more drop_func.hql
drop function my_udf;

Create a bash script to connect to HiveServer2 using Beeline and the previously-created drop_func.hql:

Delete the old JAR file, hive_udf.jar from both HDFS, and the local filesystem. On the local filesystem, the JAR can be found at the location pointed to
by either hive.aux.jars.path or hive.reloadable.aux.jars.path.

Place the updated hive_udf.jar in HDFS and the local filesystem (for example, /user/hive/udf). This time, make sure you
add the JAR to the path set by hive.reloadable.aux.jars.path. You do not need to place the JAR file at the path pointed to by hive.aux.jars.path.

If you are using Cloudera Manager, use an advanced configuration snippet to set the value for hive.reloadable.aux.jars.path.

Recreate functions defined in hive-udf.jar. This can also be done by using bash scripts to connect to HiveServer2 and run the CREATE FUNCTION command. First create an .hql file with the CREATE FUNCTION command:

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.