Installing CDH 5 with MRv1 on a Single Linux Host in Pseudo-distributed mode

Important:

Running services: when starting, stopping and restarting CDH components, always use the service (8) command rather than
running /etc/init.d scripts directly. This is important because service sets the current working directory to / and removes most
environment variables (passing only LANG and TERM) so as to create a predictable environment in which to administer the service. If you
run the /etc/init.d scripts directly, any environment variables you have set remain in force, and could produce unpredictable results. (If you install CDH from
packages, service will be installed as part of the Linux Standard Base (LSB).)

Java Development Kit: if you have not already done so, install the Oracle Java Development Kit (JDK) before deploying CDH. Follow these instructions.

Important:

Follow these command-line instructions on systems that do not use Cloudera Manager.

This information applies specifically to CDH 5.14.x. See Cloudera Documentation for information specific to other releases.

On Red Hat/CentOS/Oracle 5 or Red Hat 6 systems, do the following:

Download the CDH 5 Package

Click the entry in the table below that matches your Red Hat or CentOS system, choose Save File, and save the file to a directory to which you have write
access (it can be your home directory).

Starting Hadoop and Verifying it is Working Properly:

For MRv1, a pseudo-distributed Hadoop installation consists of one host running all five Hadoop daemons: namenode, jobtracker, secondarynamenode, datanode, and tasktracker.

To verify the hadoop-0.20-conf-pseudo packages on your system.

To view the files on Red Hat or SLES systems:

$ rpm -ql hadoop-0.20-conf-pseudo

To view the files on Ubuntu systems:

$ dpkg -L hadoop-0.20-conf-pseudo

The new configuration is self-contained in the /etc/hadoop/conf.pseudo.mr1 directory.

The Cloudera packages use the alternatives framework for managing which Hadoop configuration is active. All Hadoop components search for the Hadoop
configuration in /etc/hadoop/conf.

To start Hadoop, proceed as follows.

Step 1: Format the NameNode.

Before starting the NameNode for the first time you must format the file system.

Make sure you perform the format of the NameNode as user hdfs. If you are not using Kerberos, you can do this as part of the command string, using
sudo -u hdfs as in the command above.

$ sudo -u hdfs hdfs namenode -format

If Kerberos is enabled, do not use commands in the form sudo -u <user>
<command>; they will fail with a security error. Instead, use the following commands: $ kinit <user> (if you are using a password) or$ kinit -kt <keytab> <principal> (if you are using a keytab) and then, for each command run by this user,
$ <command>

Important:

In earlier releases, the hadoop-conf-pseudo package automatically formatted HDFS on installation. In CDH 5, you must do this explicitly.

Step 2: Start HDFS

To verify services have started, you can check the web console. The NameNode provides a web console http://localhost:50070/ for viewing your Distributed
File System (DFS) capacity, number of DataNodes, and logs. In this pseudo-distributed configuration, you should see one live DataNode named localhost.

Step 3: Create the directories needed for Hadoop processes.

Issue the following command to create the directories needed for all installed Hadoop processes with the appropriate permissions.