In a 3 node hadoop cluster. I would like the master to be 1 node. Map task taking place in one node and reduce tasks in 1 node. Map and reduce tasks should be separated. Is it possible? As far as i noticed both run together. It will be great if you can shed some light. Thank you !

1 Answer
1

This is everything else than optimal because the map output must ALWAYS be copied to another server. But you can simply modify your mapred-site.xml on the servers.

<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
<description>The maximum number of map tasks that will be run simultaneously by a task tracker.</description>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>4</value>
<description>The maximum number of reduce tasks that will be run simultaneously by a task tracker.</description>
</property>

On the server where no reducers should run you put into the reduce.task.maximum a zero. And vice versa for the other servers.

Hi , Should i mention the maximum no of map tasks always? Is it enough if i jus specify <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>0</value> <description>The maximum number of reduce tasks that will be run simultaneously by a task tracker.</description> </property> , in the node where only map tasks should run and vice versa in node where only reduce should run ?
–
sethuApr 16 '11 at 19:08

Hi, I'm not sure what the default is (it might be the number of cores) so just provide these two values. Try it out.
–
Thomas JungblutApr 16 '11 at 19:57

Do you mean the default number of map/reduce tasks may be based on the number of processors in that system?
–
sethuApr 17 '11 at 1:09