With respect to replication if I run pig job from one of the nodes withinthe Hadoop cluster then do I always end up with writing 1 replica copy tothat client node always and remaining 2 replica copies to other nodes?

> With respect to replication if I run pig job from one of the nodes within> the Hadoop cluster then do I always end up with writing 1 replica copy to> that client node always and remaining 2 replica copies to other nodes?>>

> If your client node is a datanode with your cluster then the first copy> does get written to that data node.>> Experts please feel free to correct me here.> On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote:>>> With respect to replication if I run pig job from one of the nodes within>> the Hadoop cluster then do I always end up with writing 1 replica copy to>> that client node always and remaining 2 replica copies to other nodes?>>>>>

The namenode does decide the replica for either case. It just so happensthat when running from a datanode the first replica is housed on the samenode. Hope this makes sense.On Oct 30, 2012 8:13 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote:

> Thanks and if it is not the datanode then I am guessing namenode decides> the nodes in replication pipeline?>> On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath <> [EMAIL PROTECTED]> wrote:>>> If your client node is a datanode with your cluster then the first copy>> does get written to that data node.>>>> Experts please feel free to correct me here.>> On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote:>>>>> With respect to replication if I run pig job from one of the nodes>>> within the Hadoop cluster then do I always end up with writing 1 replica>>> copy to that client node always and remaining 2 replica copies to other>>> nodes?>>>>>>>>>

Yes if you are purely a regular client (non DN box) writing to HDFS,then the chosen DNs are selected at random (but fit within policy ofcross-rack writes, if it applies to your environment).

On Wed, Oct 31, 2012 at 6:43 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:> Thanks and if it is not the datanode then I am guessing namenode decides the> nodes in replication pipeline?>>> On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath> <[EMAIL PROTECTED]> wrote:>>>> If your client node is a datanode with your cluster then the first copy>> does get written to that data node.>>>> Experts please feel free to correct me here.>>>> On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote:>>>>>> With respect to replication if I run pig job from one of the nodes within>>> the Hadoop cluster then do I always end up with writing 1 replica copy to>>> that client node always and remaining 2 replica copies to other nodes?>>>>>

-- Harsh J

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext