If this job runs from multiple reducers on the same node, those per-hostlimits will be violated. Also, this is a shared environment and I don'twant long running network bound jobs uselessly taking up all reduce slots.

I think set tasktracker.reduce.tasks.maximum to be 1 may meet your requirementBest,

-- Nan ZhuSchool of Computer Science,McGill University

On Friday, 8 February, 2013 at 10:54 PM, David Parks wrote:

> I have a cluster of boxes with 3 reducers per node. I want to limit a particular job to only run 1 reducer per node.> > This job is network IO bound, gathering images from a set of webservers.> > My job has certain parameters set to meet “web politeness” standards (e.g. limit connects and connection frequency).> > If this job runs from multiple reducers on the same node, those per-host limits will be violated. Also, this is a shared environment and I don’t want long running network bound jobs uselessly taking up all reduce slots.> > >

If this job runs from multiple reducers on the same node, those per-host limits will be violated. Also, this is a shared environment and I don’t want long running network bound jobs uselessly taking up all reduce slots.

Looking at the Job File for my job I see that this property is set to 1, however I have 3 reducers per node (I’m not clear what configuration is causing this behavior).

My problem is that, on a 15 node cluster, I set 15 reduce tasks on my job, in hopes that each would be assigned to a different node, but in the last run 3 nodes had nothing to do, and 3 other nodes had 2 reduce tasks assigned.

If this job runs from multiple reducers on the same node, those per-host limits will be violated. Also, this is a shared environment and I don’t want long running network bound jobs uselessly taking up all reduce slots.

There's no readily available way to do this today (you may beinterested in MAPREDUCE-199 though) but if your Job scheduler's notdoing multiple-assignments on reduce tasks, then only one is assignedper TT heartbeat, which gives you almost what you're looking for: 1reduce task per node, round-robin'd (roughly).

On Sat, Feb 9, 2013 at 9:24 AM, David Parks <[EMAIL PROTECTED]> wrote:> I have a cluster of boxes with 3 reducers per node. I want to limit a> particular job to only run 1 reducer per node.>>>> This job is network IO bound, gathering images from a set of webservers.>>>> My job has certain parameters set to meet “web politeness” standards (e.g.> limit connects and connection frequency).>>>> If this job runs from multiple reducers on the same node, those per-host> limits will be violated. Also, this is a shared environment and I don’t> want long running network bound jobs uselessly taking up all reduce slots.

--Harsh J

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext