Back up the directory. The path inside the <value> XML
element is the path to your HDFS metadata. If you see a comma-separated
list of paths, there is no need to back up all of them; they store the
same data. Back up the first directory, for example, by using the
following commands:

Before installing MRv1 or YARN: (Optionally) add a repository key on each system in the cluster, if you
have not already done so. Add the Cloudera Public GPG Key to your repository by executing one of the following
commands:

Step 5a: Verify that /tmp Exists and Has the Right Permissions

Important:

If you do not create /tmp properly, with the right permissions as shown below, you may have
problems with CDH components later. Specifically, if you don't create /tmp yourself, another
process may create it automatically with restrictive permissions that will prevent your other applications
from using it.

Create the /tmp directory after HDFS is up and running, and set its permissions to 1777
(drwxrwxrwt), as follows:

If Kerberos is enabled, do
not use commands in the form sudo -u <user> <command>; they will fail with a
security error. Instead, use the following commands: $ kinit <user> (if you are using a
password) or$ kinit -kt <keytab> <principal> (if you are using a
keytab) and then, for each command executed by this user, $ <command>

Step 6: Start MapReduce (MRv1) or YARN

Step 6a: Start MapReduce (MRv1)

Important:

Make sure you are not trying to run MRv1 and YARN on the same set of nodes at the same time. This is not
supported; it will degrade your performance and may result in an unstable MapReduce cluster deployment.
Steps 6a and 6b are mutually exclusive.

After you have verified HDFS is operating correctly, you are ready to start MapReduce. On each TaskTracker
system:

$ sudo service hadoop-0.20-mapreduce-tasktracker start

On the JobTracker system:

$ sudo service hadoop-0.20-mapreduce-jobtracker start

Verify that the JobTracker and TaskTracker started properly.

$ sudo jps | grep Tracker

If the permissions of directories are not configured correctly, the JobTracker and TaskTracker processes start
and immediately fail. If this happens, check the JobTracker and TaskTracker logs and set the permissions
correctly.

Verify basic cluster operation for MRv1.

At this point your cluster is upgraded and ready to run jobs. Before running your production jobs, verify
basic cluster operation by running an example from the Apache Hadoop web site.

Before you proceed, you make sure the HADOOP_HOME environment variable is unset:

$ unset HADOOP_HOME

Note:

To submit MapReduce jobs using MRv1 in CDH4 Beta 1, you needed either to set the
HADOOP_HOME environment variable or run a launcher script.

This is no longer true in later CDH4 releases; the HADOOP_HOME has been now fully
deprecated and it is good practice to unset it.

If you have client hosts, make sure you also update them to CDH4, and upgrade the
components running on those clients as well.

Step 6b: Start MapReduce with YARN

Important:

Make sure you are not trying to run MRv1 and YARN on the same set of nodes at the same time. This is not
supported; it will degrade your performance and may result in an unstable MapReduce cluster deployment.
Steps 6a and 6b are mutually exclusive.

Before deciding to deploy YARN, make sure you read the discussion under
New Features.

After you have verified HDFS is operating correctly, you are ready to start YARN. First, if you have not
already done so, create directories and set the correct permissions.

If you have client hosts, make sure you also update them to CDH4, and upgrade the
components running on those clients as well.

Step 7: Set the Sticky Bit

For security reasons Cloudera strongly recommends you set the sticky bit on directories if you have not already
done so.

The sticky bit prevents anyone except the superuser, directory owner, or file owner from deleting or moving the
files within a directory. (Setting the sticky bit for a file has no effect.) Do this for directories such as
/tmp. (For instructions on creating /tmp and setting its permissions, see
these instructions).

Cloudera recommends that you regularly update the software on each system in the
cluster (for example, on a RHEL-compatible system, regularly run yum
update) to ensure that all the dependencies for any given component are up
to date. (If you have not been in the habit of doing this, be aware that the command
may take a while to run the first time you use it.)

To upgrade or add CDH components, see the following sections:

Flume. For more information, see "Upgrading Flume in CDH4" under "Flume
Installation" in this guide.

Sqoop. For more information, see "Upgrading Sqoop to CDH4" under "Sqoop
Installation" in this guide.

Sqoop 2. For more information, see "Sqoop 2 Installation" in this guide.

HCatalog. For more information, see "Installing and Using HCatalog"
in this guide.

Hue. For more information, see "Upgrading Hue in CDH4" under "Hue Installation"
in this guide.

Pig. For more information, see "Upgrading Pig to CDH4" under "Pig Installation"
in this guide.

Hive. For more information, see "Upgrading Hive to CDH4" under "Hive
Installation" in this guide.

HBase. For more information, see "Upgrading HBase to CDH4" under "HBase
Installation" in this guide.

ZooKeeper. For more information, see "Upgrading ZooKeeper to CDH4" under
"ZooKeeper Installation" in this guide.

Oozie. For more information, see "Upgrading Oozie to CDH4" under "Oozie
Installation" in this guide.

Whirr. For more information, see "Upgrading Whirr to CDH4" under "Whirr
Installation" in this guide.

Snappy. For more information, see "Upgrading Snappy to CDH4" under "Snappy
Installation" in this guide.

Mahout. For more information, see "Upgrading Mahout to CDH4" under "Mahout
Installation" in this guide.

Step 9: Apply Configuration File Changes

Important:

During package upgrade, the package manager renames any configuration files you have modified from
<file> to <file>.rpmsave, and creates a new
<file> with applicable defaults. You are responsible for applying any changes captured
in the original configuration file to the new configuration file. In the case of Ubuntu and Debian upgrades,
you will be prompted if you have made changes to a file for which there is a new version; for details, see
Automatic
handling of configuration files by dpkg.

For example, if you have modified your zoo.cfg configuration file
(/etc/zookeeper/zoo.cfg), the upgrade renames and preserves a copy of your modified
zoo.cfg as /etc/zookeeper/zoo.cfg.rpmsave. If you have not already done so,
you should now compare this to the new /etc/zookeeper/conf/zoo.cfg, resolve differences, and
make any changes that should be carried forward (typically where you have changed property value defaults). Do
this for each component you upgrade.

Step 10: Finalize the HDFS Metadata Upgrade (Beta 1 or earlier)

Note:Skip this step if you are upgrading from CDH4 Beta 2
or later.

To finalize the HDFS metadata upgrade you began earlier in this procedure, proceed as follows:

Make sure you are satisfied that the CDH4 upgrade has succeeded and everything
is running smoothly. This could take a matter of days, or even weeks.

Warning:

Do not proceed until you are sure you are satisfied with the new deployment. Once you have finalized the
HDFS metadata, you cannot revert to an earlier version of HDFS.

Note:

If you need to restart the NameNode during this period (after having begun the upgrade process, but before
you've run finalizeUpgrade) simply restart your NameNode without the
-upgrade option.