Checklist to Help Ensure Smooth Upgrades

The following best practices for configuring and maintaining Hive will help
ensure that upgrades go smoothly.

Configure periodic backups of the
metastore
database. Use mysqldump, or the
equivalent for your vendor if you are not using MySQL.

Make sure datanucleus.autoCreateSchema is set to false (in all
types of database) and datanucleus.fixedDatastore
is set to true (for MySQL and Oracle) in allhive-site.xml files. See the
configuration
instructions for more information about setting the
properties in hive-site.xml.

Insulate the metastore database from users by running the metastore service in
Remote
mode. If you do not follow this recommendation, make sure you
remove DROP, ALTER, and
CREATE privileges from the Hive user configured in
hive-site.xml. See
Configuring the Hive Metastore for
complete instructions for each type of supported database.

Upgrading Hive from CDH 4 to CDH 5

Note:

If you have already performed the steps to uninstall CDH 4 and all components,
as described under
Upgrading from
CDH 4 to CDH 5, you can skip Step 1 below and proceed with
installing the new CDH 5 version of Hive.

Step 1: Remove Hive

Warning:

You must make sure no Hive processes are running. If Hive processes are
running during the upgrade, the new version will not work correctly.

Exit the Hive console and make sure no Hive scripts are running.

Stop any HiveServer processes that are running. If HiveServer is running as a
daemon, use the following command to stop it:

$ sudo service hive-server stop

If HiveServer is running from the command line, stop it with <CTRL>-c.

Stop the metastore. If the metastore is running as a daemon, use the following
command to stop it:

$ sudo service hive-metastore stop

If the metastore is running from the command line, stop it with <CTRL>-c.

Remove Hive:

$ sudo yum remove hive

To remove Hive on SLES systems:

$ sudo zypper remove hive

To remove Hive on Ubuntu and Debian systems:

$ sudo apt-get remove hive

Step 2: Install the new Hive version on all hosts (Hive servers and clients)

If you install a newer version of a package that is already on the system,
configuration files that you have modified will remain intact.

If you uninstall a package, the package manager renames any configuration files
you have modified from <file> to
<file>.rpmsave. If you then re-install the
package (probably to install a new version) the package manager
creates a new <file> with applicable defaults.
You are responsible for applying any changes captured in the original
configuration file to the new configuration file. In the case of
Ubuntu and Debian upgrades, you will be prompted if you have made
changes to a file for which there is a new version; for details, see
Automatic
handling of configuration files by dpkg.

Step 3: Configure the Hive Metastore

You must configure the Hive metastore and initialize the service before you can
use Hive. See
Configuring
the Hive Metastore for detailed instructions.

Step 4: Upgrade the Metastore Schema

Important:

Cloudera strongly encourages you to make a backup copy of your metastore
database before running the upgrade scripts. You will need this
backup copy if you run into problems during the upgrade or need to
downgrade to a previous version.

You must upgrade the metastore schema to the version corresponding to the
new version of Hive before starting Hive after the upgrade.
Failure to do so may result in metastore corruption.

To run a script, you must first cd to the directory that script
is in: that is
/usr/lib/hive/scripts/metastore/upgrade/<database>.

As of CDH 5, there are now two ways to do this. You could either use Hive's
schematool or use the schema upgrade scripts provided
with the Hive package.

Using schematool (Recommended):

The Hive distribution includes an offline tool for Hive metastore schema
manipulation called schematool. This tool can be used
to initialize the metastore schema for the current Hive version. It
can also upgrade the schema from an older version to the current one.

To upgrade the schema, use the upgradeSchemaFrom option to
specify the version of the schema you are currently using (see table
below) and the compulsory dbType option to specify
the database you are using. The example that follows shows an upgrade
from Hive 0.10.0 (CDH 4) for an installation using the Derby database.

Run the appropriate schema upgrade script(s); they are in
/usr/lib/hive/scripts/metastore/upgrade/. Start with
the script for your database and Hive version, and run all subsequent
scripts.

For example, if you are currently running Hive 0.10 with MySQL, and upgrading to
Hive 0.13.1, start with the script for Hive 0.10 to 0.11 for MySQL,
then run the script for Hive 0.11 to 0.12 for MySQL, then run the
script for Hive 0.12 to 0.13.1.

For more information about upgrading the schema, see the README in
/usr/lib/hive/scripts/metastore/upgrade/.

Step 5: Configure HiveServer2

HiveServer2 is an improved version of the original HiveServer (HiveServer1, no
longer supported). Some configuration is required before you
initialize HiveServer2; see
Configuring
HiveServer2 for details.

Step 6: Upgrade Scripts, etc., for HiveServer2 (if necessary)

If you have been running HiveServer1, you may need to make some minor
modifications to your client-side scripts and applications when you
upgrade:

HiveServer1 does not support concurrent connections, so many customers run a
dedicated instance of HiveServer1 for each client. These can now be
replaced by a single instance of HiveServer2.

HiveServer2 uses a different connection URL and driver class for the JDBC
driver. If you have existing scripts that use JDBC to communicate
with HiveServer1, you can modify these scripts to work with
HiveServer2 by changing the JDBC driver URL from
jdbc:hive://hostname:port to
jdbc:hive2://hostname:port, and by changing the
JDBC driver class name from
org.apache.hive.jdbc.HiveDriver to
org.apache.hive.jdbc.HiveDriver.

Step 2: Install the new Hive version on all hosts (Hive servers and clients)

Step 3: Verify that the Hive Metastore is Properly Configured

Step 4: Upgrade the Metastore Schema

Important:

Cloudera strongly encourages you to make a backup copy of your metastore
database before running the upgrade scripts. You will need this
backup copy if you run into problems during the upgrade or need to
downgrade to a previous version.

You must upgrade the metastore schema to the version corresponding to the
new version of Hive before starting Hive after the upgrade.
Failure to do so may result in metastore corruption.

To run a script, you must first cd to the directory that script
is in: that is
/usr/lib/hive/scripts/metastore/upgrade/<database>.

As of CDH 5, there are now two ways to do this. You could either use Hive's
schematool or use the schema upgrade scripts provided
with the Hive package.

Using schematool (Recommended):

The Hive distribution includes an offline tool for Hive metastore schema
manipulation called schematool. This tool can be used
to initialize the metastore schema for the current Hive version. It
can also upgrade the schema from an older version to the current one.

To upgrade the schema, use the upgradeSchemaFrom option to
specify the version of the schema you are currently using (see table
below) and the compulsory dbType option to specify
the database you are using. The example that follows shows an upgrade
from Hive 0.10.0 (CDH 4) for an installation using the Derby database.

Run the appropriate schema upgrade script(s); they are in
/usr/lib/hive/scripts/metastore/upgrade/. Start with
the script for your database and Hive version, and run all subsequent
scripts.

For example, if you are currently running Hive 0.10 with MySQL, and upgrading to
Hive 0.13.1, start with the script for Hive 0.10 to 0.11 for MySQL,
then run the script for Hive 0.11 to 0.12 for MySQL, then run the
script for Hive 0.12 to 0.13.1.

For more information about upgrading the schema, see the README in
/usr/lib/hive/scripts/metastore/upgrade/.