Warden determines the percentage of resources available for MapReduce v1 jobs and applications based on the warden.conf file. Applications include MapReduce v2 and non-MapReduce applications such as Spark.Note: If you modify the values in warden.conf, you must restart Warden.

The percent of resources allocated for YARN and MapReduce v1 jobs is based on the values of the following parameters in warden.conf:

Parameter

Default

Description

mr1.memory.percent

50

The percentage of memory allocated to MapReduce v1 jobs.The remaining memory is allocated to applications.

mr1.cpu.percent

50

The percentage of CPUs allocated to MapReduce v1 jobs.The remaining CPUs are allocated to applications.

mr1.disk.percent

50

The percentage of disks allocated to MapReduce v1 jobs.The remaining disks are allocated to applications.

These values only apply when TaskTracker and NodeManager roles are installed on a node. For example, if TaskTracker is not installed on the node, NodeManager will get 100% of the resources available to process applications regardless of the warden.conf settings. Similarly, if NodeManager is not installed on the node, TaskTracker will get 100% of the resources available to process MapReduce jobs regardless of the warden.conf settings.

YARN Container Resources

A YARN application can be a MapReduce v2 application or a non-MapReduce application. The Warden on each node calculates the resources that can be allocated to process YARN applications. Each application has an Application Master that negotiates YARN container resources. For MapReduce applications, YARN processes each map or reduce task in a container.The Application Master requests resources from the Resource Manager based on memory, CPU, and disk requirements for the YARN containers. For YARN containers that process MapReduce v2 tasks, there are additional considerations. See YARN Container Resource Allocation for MapReduce v2 Applicationsfor details.The Application Master requests YARN container resources based on the values of the following parameters:

Parameter

Default

Description

yarn.scheduler.minimum-allocation-mb

1024

Defines the minimum memory allocation available for a container in MB.

To change the value, edit the yarn-site.xml file for the node that runs the ResourceManager. Assign the new value to this property, then restart the ResourceManager.

yarn.scheduler.maximum-allocation-mb

8192

Defines the maximum memory allocation available for a container in MB

To change the value, edit the yarn-site.xml file for the node that runs the ResourceManager. Assign the new value to this property, then restart the ResourceManager.

yarn.nodemanager.resource.memory-mb

Variable. This value is calculated by Warden.

Defines the memory available to processing Yarn containers on the node in MB.

Warden uses the following formula to calculate this value: [total physical memory on node] – [memory required by the operating system, MapR-FS, and MapR services installed on the node]-[memory allocated to MapReduce v1 jobs, if TaskTracker is installed on the node].

To determine the value, go to the ResourceManager UI and view the memory available for that node.

yarn.nodemanager.resource.cpu-vcores

Variable. This value is calculated by Warden.

Defines the number of CPUs available to process YARN containers on this node.

Warden uses the following formula to calculate this value: [# CPU cores on node] – [# of CPU cores assigned to Mapr-FS]-[# of CPU cores assigned to MapReduce v1 jobs, if Task Tracker is installed on the node].

To determine the value, go to the ResourceManager UI or the YARN pane on the MCS and view the number of CPUs available for that node.

To change the value, edit the yarn-site.xml file for the node, assign the new value to this property, then restart the NodeManager.

yarn.nodemanager.resource.io-spindles

Variable. This value is calculated by Warden.

Defines the number of disks available to process YARN containers. Warden uses the following formula to calculate this value: [# of disk on the node] – [# of disks assigned to process MapReduce v1 jobs].

To determine the value, go to the ResourceManager UI or the YARN pane in the MCS and view the disk information for this node.

YARN Container Resources for MapReduce v2 Applications

In addition to the YARN container resource allocation parameters, the MapReduce ApplicationMaster also considers the following container requirements when it sends requests to the ResourceManager for containers to run MapReduce jobs:

Parameter

Default

Description

mapreduce.map.memory.mb

1024

Defines the container size for map tasks in MB.

mapreduce.reduce.memory.mb

3072

Defines the container size for reduce tasks in MB.

mapreduce.reduce.java.opts

-Xmx2560m

Java options for reduce tasks.

mapreduce.map.java.opts

-Xmx900m

Java options for map tasks.

mapreduce.map.disk

0.5

Defines the number of disks a map task requires.For example, a node with 4 disks can run 8 map tasks at a time.Note: If I/O intensive tasks do not run on the node, you may want to change this value.

mapreduce.reduce.disk

1.33

Defines the number of disks that a reduce task requires.For example, a node with 4 disks can run 3 reduce tasks at a time.Note: If I/O intensive tasks do not run on the node, you might want to change this value.

You can use one of the following methods to change the default configuration:

Provide updated values in the mapred-site.xml file on the node that runs the job. You can use central configuration to change this value on each node that runs the NodeManager in the cluster. Then, restart NodeManager on each node in the cluster. The mapred-site.xml file for MapReduce ve applications is located in the following directory: opt/mapr/hadoop/hadoop-2.x.x/etc/hadoop

Override the default values from the command line for each application that requires a non-default value.

MapReduce v1 Job Resource Allocation

When a MapReduce v1 job is submitted to JobTracker, JobTracker determines which TaskTracker nodes can process the map and reduce tasks based on the available map and reduce slots. Map and reduce slots are allocated based on the memory available to process MapReduce V1 jobs, and the number of CPUs and Disks available to MapR-FS.In general, you should not need to customize the number of map and reduce slots. However, you can configure the parameters that are used to calculate the values. For more information, see Customizing the MapReduce v1 Slot Calculation Parameters.

Criteria for Map Slot Calculation

MapR Hadoop sets the number of map slots to the lowest value that results from the following memory, CPU, and disk calculations:

If # CPUs on the node > 2, then CPU calculation is = number of CPU on the node – the number of CPUs assigned to MapR-FS. Note: The number of CPUs available to the MapR-FS is 4 for an Enterprise Database Edition installation. Otherwise, the value is 2.

If # CPUs on the node <= 2, the CPU calculation = 1.

Disk calculation= 2 * the number of disks available to MapR-FS

Criteria for Reduce Slot Calculation

MapR Hadoop sets the number of reduce slots to the lowest value that results from the following memory, CPU, and disk calculations:

If # CPUs on the node > 2, then the CPU calculation is = number of CPUs on the node – the number of CPU assigned to MapR-FSNote: The number of CPU available to the MapR-FS is 4 for an Enterprise Database Edition installation. Otherwise, the value is 2.

If # CPUs on the node <= 2, the CPU calculation is = 1

Disk calculation:

If the # of disks available to the MapR-FS > 2, the disk calculation = 0.75 * the number of disks available to MapR-FS

If the # of disks available to the MapR-FS <= 2, the disk calculation = 1

Example Map and Reduce Slot Calculation

In the following example, the node has the following configuration:

Node Resources or Settings

Values

Services and Options

TaskTracker, MapR-FS, MapR-DB

CPU/Core

24

Disks Available to MapR-FS

5

RAM

48G

Chunk Size

256MB

Based on this configuration, MapR Hadoop performs the following calculations to determine the number of map and reduce slots:

Calculation

Value

Description

Number of CPUs

4

Since MapR-DB is running, 4 CPUs are used to determine the slot calculation.

Memory for Map Slots

1G

Since the chunk size is 256, 1G is allocated to memory for map slots.Warden sets mapred.job.map.memory.physical.mb to 1000MB.

Memory for Reduce Slots

3G

Since the chunk size is 256, 3G is allocated to memory for map slots.Warden sets mapred.job.reduce.memory.physical.mb to 3000MB.

Memory available to process MapReduce V1 tasks

26G

Based on the services running on the node, Warden calculates the memory available to process MapReduce v1 tasks.In this example, Warden sets mapreduce.tasktracker.reserved.physicalmemory.mb to 26000MB.For more information, see Memory Allocation for Nodes.

Customizing the MapReduce v1 Slot Calculation Parameters

In general, you should not need to customize the number of map and reduce slots because Warden determines these value based on the resource available on the node.However, you can override the number of slots by adding one or more of these parameters to mapred-site.xml. The mapred-site.xml file for MapReduce v1 jobs is in the following location: /opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml.

Note: If you make changes to mapred-site.xml, you must restart TaskTracker.Warden uses the following parameters to calculate and assign values to map slots and reduce slots on each node:

Parameter

Default Value

Description

mapreduce.tasktracker.reserved.physicalmemory.mb

Warden uses the following formula to calculate this value: [total physical memory on node] – [memory required by the operating system, MapR file system and MapR services installed on the node]-[memory allocated to YARN applications, if Node Manager is installed on the node].For more information, see Memory Allocation for Nodes.To determine the value, go to the TaskTracker UI and view the memory available for that node.