Cloudera Manager version 3 and CDH3 have reached End of Maintenance (EOM) as of June 20th, 2013. Cloudera will not support or provide patches for any of the Cloudera Manager version 3 and CDH3 releases. To view documentation related to later releases, click the Documentation link at the top of this page.

Job Designer

Introducing Hue Job Designer

The Job Designer application enables you to create and submit Hadoop Map/Reduce jobs to the Hadoop cluster. You
can include variables with your jobs to enable you and other users to enter values for the variables when they
run your job. The Job Designer supports streaming and JAR jobs. For more information about Hadoop Map/Reduce,
see the
Hadoop
Map/Reduce Tutorial.

Note:

A job's input files must be uploaded to the cluster before you can submit the job.

Job Designer Installation and Configuration

Job Designer is one of the applications that can be installed as part of Hue. For more information about
installing Hue, see Hue Installation.

Using Job Designer

The following sections describe how to start and use Job Designer.

Starting Job Designer

To start Job Designer, click this icon
in the application bar at the bottom of the Hue web page. The Job Design List window opens in the Hue
web page.

Installing the Job Designer Samples

The Job Designer sample jobs can help you learn how to use Job Designer. To install the Job Designer samples,
click Install Samples in the Job Design List window and then click Ok. The sample jobs
are displayed in the Job Design List window. Job Designer removes the Install Samples button
after the samples are installed so you can only install the samples once.

Working with Job Designs

In the Job Designer, a job design specifies several meta-level properties of a Map/Reduce job, including the
job design name, description, the Map/Reduce executable scripts or classes, and any parameters for those
scripts or classes. You can create two types of job designs: a
streaming job design and a
JAR job design.

In the Job Design Editor:Streaming Job window, specify the
following information.

Setting

Description

Note

You can use variables of the form $variable_name for the Input,
Output, Mapper Cmd, and Reducer Cmd settings described in the following
table. When the streaming job is run, a dialog box will appear to enable you or users to
specify the values of the variables.

Name

The Name identifies the streaming job design including the associated properties and
parameters.

Description

Specify a description of the streaming job design. The description is displayed in the dialog
box that appears if you specify variables for the job.

Input

Specify the path to the file or directory you want to use as the input data for the streaming
job. If you specify a directory, all files in that directory are used for input. Equivalent to
the Hadoop -input option.

Output

Specify the path to the directory where you want to save the output of the streaming job. The
directory cannot exist before you run the job or else the job will not run. (This requirement
is a precaution to prevent overwriting data from other jobs.) Equivalent to the Hadoop
-output option.

Mapper Cmd

Specify the path to the mapper script or class. If the mapper file is not on the machines on
the cluster, use the Required Files option to pack it as a part of job submission.
Equivalent to the Hadoop -mapper option.

Reducer Cmd

Specify the path to the reducer script or class. If the reducer file is not on the machines on
the cluster, use the Required Files option to pack it as a part of job submission.
Equivalent to the Hadoop -reducer option.

Num Reduce Tasks

Specify the number of reduce tasks you want to use. Specify zero if you do not want to run any
reducer tasks. If you don't specify a value for this setting, the default specified in your
cluster configuration takes effect. The optimal number of reduce tasks is the product of the
following values:-- a factor of 0.95 or 1.75multiplied by:-- the number of nodes in your
cluster multiplied by the mapred.tasktracker.reduce.tasks.maximum
propertyIf your reduce tasks are not very big, use a factor of 0.95 to use fewer reduce tasks
than the number of nodes in your cluster. This factor allows for a small number of failed
reduce tasks without increasing the time required for running the jobs.If your reduce tasks
are very big, use a factor of 1.75 to use more reduce tasks than the number of nodes in your
cluster. This factor allows for better load balancing and failed reduce tasks do not
significantly increase the time required for running the jobs.

Required Files

Specify the executable files that do not exist on the machines in the cluster to pack your
executable files as a part of job submission.

Select Submit upon save to submit the job to the cluster
immediately after you click Save.

Click Save to save the job settings.

Creating a JAR Job Design

A Hadoop JAR consists of Map/Reduce functions written in Java.

To create a JAR job design:

In the Job Design List window, click Jar.The Job Design
Editor:Jar Job window opens where you can specify
information about the JAR job.

In the Job Design Editor:Jar Job window, specify the following
information.

Setting

Description

Note

You can use variables of the form $variable_name for the Arguments
setting described in the following table. When the JAR job is run, a dialog box will appear to
enable you or users to specify the values of the variables.

Name

The Name identifies the JAR job and it's collection of parameters.

Description

Specify a description of the JAR job. The description is displayed in the dialog box that
appears if you specify variables for the job.

Jarfile

Specify the name of the JAR file, including the path.

Arguments

Specify the arguments you want to pass to the running JAR job.

Select Submit upon save to submit the job to the cluster
immediately after you click Save.

Click Save to save the job settings.

Submitting a Job to a Cluster

To submit a job to a cluster:

In the Job Design List window, click job designs in the
upper left corner.Your jobs and other users' jobs are displayed in the
Job Design List window.

In the Job Design List window, double-click the job you want to
submit. You can also right-click and choose Submit to
Cluster.

If the job contains variables, enter the information requested in the
dialog box that appears.For example, the sample streaming PI Calculator
job displays the following dialog box to enable you to specify the
settings for Iterations per Mapper and Num of
mappers.

Click Ok to submit the job.After the job is complete, the Job
Designer displays the results of the job including the last 10 KB of
stdout and stderr for a streaming
job. For example, after the sample streaming PI Calculator job is
complete, the following results appear.For information about displaying job results, see Displaying Job Results.

Copying, Editing, and Deleting a Job Design

If you want to edit and use a job but you don't own it, you can make a copy of it and then edit and use the
copied job.