If the copies are on different racks and if one of them is being decommissioned, it is possible that both replicas end up on same rack (maxNodesPerRack =2, isGoodTarget returns True because maxNodesPerRack =2)

T Meyarivan
added a comment - 28/Oct/10 06:03 When repl=2:
If the copies are on different racks and if one of them is being decommissioned, it is possible that both replicas end up on same rack (maxNodesPerRack =2, isGoodTarget returns True because maxNodesPerRack =2)
–

T Meyarivan
added a comment - 24/Mar/11 15:56 Sequence seems to be:
chooseTarget() -> chooseLocalRack() -> chooseRandom() -> isGoodTarget()
(IIUC numOfResults=2 if both replicas for a block with repl=2 are available)
Multiple paths:
[1] In chooseTarget(), writer may not be the decommissioning node
[2] Whole rack (including the writer) is decommissioning => chooseLocalRack() picks a node from the same rack as other replica
–

Todd Lipcon
added a comment - 24/Jun/11 02:32 There's definitely a bug still in trunk, here. Here's my analysis (in this case with repl=3):
Cluster has 2 racks, A and B, each with three nodes.
Block initially written on A1, B1, B2
Admin decommissions two of these nodes (let's say A1 and B1 but it doesn't matter)
in computeReplicationWorkForBlock , it calls chooseSourceDatanode which populates containingNodes with all three nodes regardless of decom status
this is then passed through as chosenNodes in the call to chooseTarget
maxNodesPerRack is then assigned (5-1)/2 + 2 = 4
chooseTarget initializes result with these same three nodes
{numOfResults}} is thus 3, and it calls chooseRandom rather than basing its decision on local/remote racks
chooseRandom is free to pick any two nodes since maxNodesPerRack is 4
Will attach a failing unit test momentarily

Sorry, I think the above test actually fails because it will sometimes decommission all of the nodes on one of the test racks.

But, if you bump it up to have 3 nodes in each rack, you'll see the new code path from HDFS-15 get triggered. – you can see it first re-replicate the block to be all one one host, and then after it gets the addStoredBlock calls, it notices it's not on enough racks, re-replicates elsewhere, and eventually the random choice gets it on the right one.

Todd Lipcon
added a comment - 24/Jun/11 03:55 Sorry, I think the above test actually fails because it will sometimes decommission all of the nodes on one of the test racks.
But, if you bump it up to have 3 nodes in each rack, you'll see the new code path from HDFS-15 get triggered. – you can see it first re-replicate the block to be all one one host, and then after it gets the addStoredBlock calls, it notices it's not on enough racks, re-replicates elsewhere, and eventually the random choice gets it on the right one.