3.1 Providing Remote Client Access to CDH

Oracle Big Data Appliance supports full local access to all commands and utilities in Cloudera's Distribution including Apache Hadoop (CDH).

You can use a browser on any computer that has access to the client network of Oracle Big Data Appliance to access Cloudera Manager, Hadoop Map/Reduce Administration, the Hadoop Task Tracker interface, and other browser-based Hadoop tools.

To issue Hadoop commands remotely, however, you must connect from a system configured as a CDH client with access to the Oracle Big Data Appliance client network. This section explains how to set up a computer so that you can access HDFS and submit MapReduce jobs on Oracle Big Data Appliance.

See Also:

My Oracle Support ID 1506203.1

3.1.1 Prerequisites

Ensure that you have met the following prerequisites:

You must have these access privileges:

Root access to the client system

Login access to Cloudera Manager

If you do not have these privileges, then contact your system administrator for help.

The client system must run an operating system that Cloudera supports for CDH4. For the list of supported operating systems, see "Before You Install CDH4 on a Cluster" in the Cloudera CDH4 Installation Guide at

The client system must run the same version of Oracle JDK as Oracle Big Data Appliance. CDH4 requires Oracle JDK 1.6.

3.1.2 Installing CDH on Oracle Exadata Database Machine

When you use Oracle Exadata Database Machine as the client, you can use the RPM files on Oracle Big Data Appliance, because both engineered systems use the same operating system (Oracle Linux 5.x). Copying the files across the local network is faster than downloading them from the Cloudera website.

Note:

In the following steps, replace version_number with the missing portion of the file name, such as 2.2.0+189-1.cdh4.2.0.p0.8.el5.

To install a CDH client on Oracle Exadata Database Machine:

Log into an Exadata database server.

Verify that Hadoop is not installed on your Exadata system:

rpm -qa | grep hadoop

If the rpm command returns a value, then remove the existing Hadoop software:

rpm -e hadoop_rpm

Copy the following Linux RPMs to the database server from the first server of Oracle Big Data Appliance. The RPMs are located in the /opt/oracle/BDAMammoth/bdarepo/RPMS/x86_64 directory.

ed-version_number.x86_64.rpm

m4-version_number.x86_64.rpm

nc-version_number.x86_64.rpm

redhat-lsb-version_number.x86_64.rpm

Install the Oracle Linux RPMs from Step 4 on all database nodes. For example:

Check the output for HDFS users defined on Oracle Big Data Appliance, and not on the client system. You should see the same results as you would after entering the command directly on Oracle Big Data Appliance.

Validate the installation by submitting a MapReduce job. You must be logged in to the host computer under the same user name as your HDFS user name on Oracle Big Data Appliance.

3.2.2 Providing User Login Privileges (Optional)

Users do not need login privileges on Oracle Big Data Appliance to run MapReduce jobs from a remote client. However, for those who want to log in to Oracle Big Data Appliance, you must set a password. You can set or reset a password the same way.

3.3Recovering Deleted Files

CDH provides an optional trash facility, so that a deleted file or directory is moved to a trash directory for a set period of time instead of being deleted immediately from the system. By default, the trash facility is enabled for HDFS and all HDFS clients.

3.3.1 Restoring Files from the Trash

When the trash facility is enabled, you can easily restore files that were previously deleted.

To restore a file from the trash directory:

Check that the deleted file is in the trash. The following example checks for files deleted by the oracle user:

3.3.3 Disabling the Trash Facility

The trash facility on Oracle Big Data Appliance is enabled by default. You can change this configuration for a cluster. When the trash facility is disabled, deleted files and directories are not moved to the trash. They are not recoverable.

3.3.3.1 Completely Disabling the Trash Facility

The following procedure disables the trash facility for HDFS. When the trash facility is completely disabled, the client configuration is irrelevant.

Search for or scroll down to the Filesystem Trash Interval property under NameNode Settings. See Figure 3-2.

Click the current value, and enter a value of 0 (zero) in the pop-up form.

Click Save Changes.

Expand the Actions menu at the top of the page and choose Restart.

3.3.3.2 Disabling the Trash Facility for Local HDFS Clients

All HDFS clients that are installed on Oracle Big Data Appliance are configured to use the trash facility. An HDFS client is any software that connects to HDFS to perform operations such as listing HDFS files, copying files to and from HDFS, and creating directories.

You can use Cloudera Manager to change the local client configuration setting, although the trash facility is still enabled.

Search for or scroll down to the Use Trash property under Client Settings. See Figure 3-2.

Deselect the Use Trash check box.

Click Save Changes. This setting is used to configure all new HDFS clients downloaded to Oracle Big Data Appliance.

Open a connection as root to a node in the cluster.

Deploy the new configuration:

dcli -C bdagetclientconfig

3.3.3.3 Disabling the Trash Facility for a Remote HDFS Client

Remote HDFS clients are typically configured by downloading and installing a CDH client, as described in "Providing Remote Client Access to CDH." Oracle SQL Connector for HDFS and Oracle R Connector for Hadoop are examples of remote clients.